As early as 2017, only two years after OpenAi was founded in California, the Chinese government placed massive investments in artificial intelligence and incorporated AI into its national strategy. Nevertheless, during the ‘two sessions’ in 2023, Science and Technology Minister Wang Zhigang candidly confessed to the media that the Chinese AI enterprise could not develop an AI language model that could compete with ChatGPT on an equal footing. So what are the reasons that led to China’s failure to lead the next AI revolution?
Objectively speaking, Chinese AI researchers have made significant progress in various areas. The Chinese government established public-private platforms and an “AI National Team” to accelerate the implementation of general-purpose AI that can be directly applied to improve productivity in different fields.
Baidu, the industry leader of the AI National Team, launched its self-driving platform Apollo in 2017 and open-sourced its hardware and software solutions for automobile manufacturers. Other Chinese tech giants were also integrated into the advancement of AI technology. For example, Alibaba was tasked with developing cloud computing and smart city platforms. Tencent decided to invest in AI applications in healthcare.
Many technical challenges and ethical issues must be solved to achieve genuinely driverless. However, large language models like ChatGPT are about to lead a more significant revolution. Unfortunately, it appears that Chinese AI experts and policymakers will miss out on the next big industry trend. Systemic deficiencies in the Chinese internet ecosystem make it difficult for high-level language-based AI models like ChatGPT to emerge in China.
- What makes the significant language model special?
A large language model, commonly known as LLM, is a robust deep learning algorithm that uses massive datasets to perform various language-bask tasks. It can effortlessly recognise, summarise, translate, predict, and even generate text and other forms of content by harnessing the knowledge and patterns acquired from the vast amounts of data it has been trained on.
It can be inferred from this definition that two conditions must be met to develop a powerful large language model: an excellent algorithm and an ample supply of high-quality training data. Due to the unique ecological environment of the Chinese internet and the combined effects of various political and economic factors, AI developers in China must put in a significant amount of effort to overcome the disadvantages of algorithms and data.
- A bigger “walled garden.”
As the internet evolved, it was taken over by different companies vying for control, each building its own closed-off spaces around its digital content. These “walled gardens” gradually replaced the once-free and open online landscape, creating a new era of corporate domination. Billions of Internet users in China find accessing online services operated by different enterprises irritating. For example, WeChat users must use an external browser to share a commodity they found on Alibaba. While the government has been settled to eradicate monopolistic practices and facilitate interconnectivity, tech companies are still navigating how to meet the regulation while safeguarding their interests. The data generated on each platform is exclusive to other competitors.
Ernie Bot is the chatbot introduced by the Chinese tech giant Baidu. Since it is impossible to utilise data from competitors, Baidu has no choice but to rely on the data provided by the open-source platform and its source to train its algorithms.
So how is the quality of Baidu’s proprietary database?
Once the biggest traffic portal on the Chinese internet, Baidu’s AI development has been hindered by poor decisions in its search engine business. In 2019, the renowned Chinese media scholar Kecheng Fang released an article that fiercely criticised Baidu, the largest Chinese search engine, as it abused its dominant position in the black box business of “search engine optimisation”. As a result of Baidu’s misconduct, the Chinese internet is contaminated by “link farms”, “content farms”, “splogs”, misinformation, and other undesirable content.
Even today, these scandals continue to affect outsiders’ confidence in Baidu’s AI business. In March 2023, an influencer put Ernie Bot to the test by asking it to draw a picture of the nonsensical Chinese term “yikeyidouzi”, which literally translates to “one can bean” in English. Ernie bot responded by generating an image of a can of beans, sparking a flurry of social media posts by users claiming that Erine Bot had mistakenly generated various erroneous images.
Amid growing speculation that Baidu may be translating prompts into English and then utilising other companies’ AI tools to produce search results, the tech giant released a statement on Thursday via Weibo in response to mounting criticism. Baidu emphasised that Erine Bot is a proprietary large language model developed in-house and trained using publicly available internet data worldwide. According to an industry insider, the shortage of Chinese language data is one of the reasons behind this phenomenon. Most open-source datasets currently available are in English, with a limited number of Chinese language corpora of comparable size.
- Does technology know no borders?
At present, the difficulties confronting China’s AI technology development extend beyond the issue of algorithm efficiency. After all, classifying AI algorithms as Chinese or American based on their national origin is challenging. A more significant problem is that the country’s AI industry finds it difficult to separate itself from Western-dominated open-source platforms.
Global technology companies increasingly utilise open-source platforms for AI development (Parasol, 2021). Open-source AI platforms can be conceptualised as a vertical technology stack that external entities can access and use through APIs. Since 2017, Chinese programmers have enthusiastically adopted the open-source philosophy, as evidenced by the rapid growth of GitHub users in China compared to other countries. Although international civil cooperation can accelerate the progress of China’s AI technology, the Chinese government is still unwilling to see Chinese AI developers become overly dependent on Western open-source platforms.
From the perspective of the Chinese government, achieving a leading position in the global AI industry was only part of the goal. A bigger significance lies in enhancing the resilience and independence of the national economy and maintaining digital sovereignty (Parasol, 2021). To make this vision a reality, the Chinese government initiated its data localisation policy in the name of national data sovereignty. The first draft of the cybersecurity law was published in 2015 and officially implemented in June 2017.
At this stage, Chinese AI practitioners started to see systemic contradictions emerging. Research has shown that implementing the data localisation policy could be incompatible with the open-source platform (Parasol, 2021). Training an AI chatbot in Chinese is also challenging because the country’s open-source ecosystem is not as advanced or comprehensive as in Western countries (Jiang & Feng, 2023). Similar worries have been raised in earlier discussions about the Ernie bot scandal.
The Chinese government had already recognised the necessity of having foreign open-source platforms available to Chinese AI developers. According to external speculation, the government purposely left loopholes in the legal provisions to create a buffer zone for China’s AI industry to disengage with the Western open-source platform (Parasol, 2021). Allowing Chinese AI developers to use Western open-source platforms could lead to the risk of data leakage, while prohibiting them from using such platforms would significantly impede the progress of China’s AI industry. Before China achieves significant improvement in constructing AI-related infrastructure domestically, it may be difficult for the Chinese government to solve this systematic problem quickly.
- When AI becomes political
One of the impacts of the digitalised society is blurring the delineation between the government and the market (Pasquale, 2015). As a result, a clandestine group of individuals who utilise financial resources and control over media for their benefit have been established.
Among many types of AI, the development process of large language models like ChatGPT reflects the social nature of AI development. As Kate Crawford (2021) noted, AI systems are not self-sufficient, logical machines that could exist independently. Owing to the high cost of scaling up AI implementation and how it is optimised to function, AI systems are ultimately created to serve the interests of those who already hold power (Crawford, 2021). Therefore, the progress of AI is wholly contingent upon an extensive range of political and social structures.
As the evolving internet and digital platforms have become increasingly popular in China, various technologies have become prominent at different points as the internet continues to evolve (Flew, 2021). However, amidst rapid economic, technological, and social transformations, one aspect that has remained constant is the pivotal role of the CCP in regulating the flow of information and shaping public opinion to align with the state’s official ideology (Flew, 2021).
The ruling Communist Party has long maintained tight control over political and social discussions within China and has recently taken decisive measures against online content that are considered inappropriate (Jiang & Feng, 2023). The limitations on online discussions in China restrict the datasets available to scientists to train AI chat models. (Jiang & Feng, 2023). Ernie Bot faced challenges when discussing politics due to China’s strict censorship laws. When asked whether China is a democratic country, Baidu’s bot evaded the question by stating that it had not yet acquired the knowledge to answer it (Feng, 2023). OpenAI’s creation of ChatGPT also involved a kind of censorship. Governments might be worried about AI systems producing content that could be seen as sensitive or politically inappropriate. (Jiang & Feng, 2023).
- “One-way-mirror” or “black box.”
In recent years, the progress in deep learning for computer vision has fostered a common perception that the most precise model for a given data science problem must necessarily be intricate and lack interpretability (Rudin & Radin, 2019).
Why would AI algorithms have to be black boxes?
Even though “open-source” platforms often emphasise their commitment to openness and transparency, enterprises still conceal critical information to safeguard their intellectual property and business secrets. This turns the black box into an essential component of the open-source platform. For example, even the industry leader OpenAi chose not to disclose its algorithm, open-source code and related datasets because it could be misused for malicious purposes (Pasquale, 2015).
The topic of the black box is significant because authority is now being increasingly expressed through algorithms. These automated processes have been used for a long time to operate planes and run the physical infrastructure of the Internet (Pasquale, 2015). The software can encode thousands of rules and instructions and process them within a fraction of a second. The exclusive access to the code and data genuinely undermined transparency and repeatability.
The type of AI that the Chinese government favours is purely instrumental. A “One-way mirror” for the administrators to monitor and manage the administration behind the scenes. A Reuters examination of government documents revealed that numerous Chinese companies had developed software employing artificial intelligence to organise information gathered on citizens in response to solid demand from authorities eager to modernise their surveillance technology (Baptista, 2022).
Secrecy can be achieved through different means. Real secrecy creates a physical or virtual barrier to prevent unauthorised access to hidden content. Legal secrecy, on the other hand, imposes an obligation on those who have access to certain information to keep it confidential (Pasquale, 2015).
One could argue that Chinese society can be perceived as a massive black box and the Chinese Communist Party as a deep learning programmer. Speeches, party documents, and regulations train the black box to respond to specific behaviours in particular ways. However, in the end, no individual fully understands the system’s operation. Within Chinese society, the vast black box, an investment, like large language model AI, could potentially be the best deal of the century or suddenly be deemed a state crime. Expressing an idea or opinion can potentially influence the debate, but it could also result in unknowingly entering forbidden territory. Given that half of China’s population works in the service sector, any alteration to this industry could significantly impact the country’s social stability. This is particularly true if the change is disruptive and brought about by generative AI.
Even for the creators of this model, controlling and interpreting the calculation process can be a daunting task, especially when the algorithms are partly provided by foreign open-source platforms. Chinese investors and government officials find this AI investment too risky to accept.
- The road ahead is arduous and lengthy.
Integrating AI into society can vastly enhance productivity, and failing to adopt AI could impede China’s progress when competing with other nations worldwide. China has missed the opportunity to be the frontrunner and is now forced to make the hard decision due to systemic reasons. Against the backdrop of the US-China competition, it is extremely challenging for Chinese companies to seize a leading position in AI technology.
Kecheng, F. (2019). The Baidu phenomenon reflects the state of the Chinese internet: both a cage and a jungle. BBC. https://www.bbc.com/zhongwen/simp/chinese-news-46997294
Baptista, E. (2022). China uses AI software to improve its surveillance capabilities. Reuters. https://www.reuters.com/world/china/china-uses-ai-software-improve-its-surveillance-capabilities-2022-04-08/
Baptista, E & Ye, J. (2023). China’s answer to ChatGPT? Baidu shares tumble as Ernie Bot disappoints. Reuters. https://www.reuters.com/technology/chinese-search-giant-baidu-introduces-ernie-bot-2023-03-16/
Crawford, K. (2021). Atlas of AI. Yale University Press
Dai, S & Jing, M. (2017). China recruits Baidu, Alibaba and Tencent to AI ‘National team’. SCMP. https://www.scmp.com/tech/china-tech/article/2120913/china-recruits-baidu-alibaba-and-tencent-ai-national-team
Deng, I., Qu, T. & Zhang, J. (2021). China’s internet-walled gardens may be cracking, but users face a long wait for the bricks to come tumbling down. SCMP. https://www.scmp.com/tech/tech-trends/article/3155846/chinas-internet-walled-gardens-may-be-cracking-users-face-long
Feng, C. (2023). ChatGPT vs Erine Bot: Baidu’s AI product has an issue with politics but is adept at grabbing up-to-date information. SCMP. https://www.scmp.com/tech/article/3214782/chatgpt-vs-ernie-bot-baidus-ai-product-has-issue-politics-adept-grabbing-date-information
Flew, T. (2021). Regulating Platform. Polity Press.
Lee, A. (2023). What are large language models used for? Nvidia. https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for/
Macaws, B. (2018). China’s black box superiority. Politico. https://www.politico.eu/blogs/the-coming-wars/2018/11/china-black-box-superiority-cybersecurity-artificial-intelligence-ai/
Milmo, D. (2023). ChatGPT reaches 100 million users two months after launch. The Guardian. https://www.theguardian.com/technology/2023/feb/02/chatgpt-100-million-users-open-ai-fastest-growing-app
Kwan, R. (2023). Chinese ChatGPT rival from search engine firm Baidu fails to impress. The Guardian. https://www.theguardian.com/world/2023/mar/16/chinese-chatgpt-rival-search-engine-baidu-fails-impress-ernie-bot
Parasol, M. (2021). AI development and the fuzzy logic of Chinese cyber security and data laws. Cambridge University Press.
Pasquale, F. (2015). The Black Box Society. Harvard University Press.
Liu, S. (2023). Artificial Intelligence will bring Social Changes in China. The Diplomat. https://thediplomat.com/2023/03/artificial-intelligence-will-bring-social-changes-in-china/
Narayan, J., Hu, K., Coulter, M. & Mukherjee, S. (2023). Elon Musk and others urge AI pause, citing ‘risks to society.’ Reuters. https://www.reuters.com/technology/musk-experts-urge-pause-training-ai-systems-that-can-outperform-gpt-4-2023-03-29/\
Rudin, C. & Radin, J. (2019). Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition. Harvard Data Science Review, 1(2).
Reuters. (2017). China’s tough cybersecurity law to come into force this week. SCMP. https://www.scmp.com/news/china/policies-politics/article/2096094/chinas-tough-cybersecurity-law-come-force-week
Reuters. (2023). Baidu sues Apple, app developers over fake Ernie bot apps. The Economic Times. https://economictimes.indiatimes.com/tech/technology/baidu-sues-apple-app-developers-over-fake-ernie-bot-apps/articleshow/99335296.cms?from=mdr
Shen, X. (2023). China’s ‘two sessions’ 2023: ChatGPT-like artificial intelligence is ‘difficult to achieve’, China’s tech minister says. SCMP. https://www.scmp.com/tech/policy/article/3212434/chinas-two-sessions-2023-chatgpt-artificial-intelligence-difficult-achieve-chinas-tech-minister-says?module=inline&pgtype=article
Soo, Z. (2018). China shows off autonomous driving technology in annual spring festival gala. SCMP. https://www.scmp.com/tech/innovation/article/2133796/china-shows-autonomous-driving-technology-annual-spring-festival
Xin, L. (2023). China is playing catch-up in the ChatGPT world, a Chinese lawmaker says. SCMP. https://www.scmp.com/news/china/science/article/3213197/china-playing-catch-chatgpt-world-chinese-lawmaker-says
Jing, M & Dai, S. (2017). China recruits Baidu, Alibaba and Tencent to AI ‘national team’. SCMP. Jiang, B & Feng, C. (2023). ChatGPT has grabbed headlines but developing a Chinese competitor will face censorship, cost and data challenges. SCMP. https://www.scmp.com/tech/article/3210754/chatgpt-has-grabbed-headlines-developing-chinese-competitor-will-face-censorship-cost-and-data?module=inline&pgtype=article