YUXIN GUO IS a master’s student studying at a Beijing University. For a few months, she had been following online discussions about ChatGPT, the generative AI tool that produces almost natural-sounding language in response to text prompts. One video she found on social media platform Weibo showed how college students in the US were using the technology to write research papers. In February, she finally decided to try it out for herself.
“I got curious because so many people are talking about it,” Guo says, “although not a lot of people seem to clearly know how to access it.”
ChatGPT isn’t available in China—it’s not blocked, but OpenAI, which built the tool, hasn’t made it available there—so Guo went onto Taobao, China’s biggest ecommerce site, where hundreds of thousands of merchants offer everything from iPhone cases to foreign driver’s licenses.
ChatGPT logins have become a hot commodity on Taobao, as have foreign phone numbers—particularly virtual ones that can receive verification codes. A simple search on the platform in early February returned more than 600 stores selling logins, with prices ranging from 1-30 RMB ($0.17-$4.28). Some stores have made thousands of sales. On Tencent’s WeChat, a thriving market for ChatGPT knockoffs has sprung up—mainly via mini programs (sub-applications on the platform) like “ChatGPT Online.” These offer users a handful of free questions before charging for time using a chatbot. Most of these are intermediaries—they ask ChatGPT questions for users and then send the answers back. On Baidu, China’s biggest search engine, “How to use ChatGPT within China” has been consistently trending for weeks.
The scale of the black market for access to ChatGPT—and the proliferation of copycats—shows how much latent demand there is for generative AI products in China, but also the challenges facing companies that want to develop them. The “black box” nature of generative AI makes it hard to predict a chatbot’s output, which could be perilous in the heavily controlled Chinese internet.
“Big Chinese companies developing a ChatGPT-like product puts into tension two of the Chinese government’s biggest priorities: leadership in AI and control over information,” says Matt Sheehan, a fellow at the Carnegie Endowment for International Peace who studies China’s AI ecosystem.
China’s tech giants have scrambled to catch up with OpenAI and get their own products to market—although several of them had been working on large language models for years.
On February 7, Baidu announced it would launch Ernie bot (“Wen Xin Yi Yan” in Chinese) for internal testing in March. The bot will be based on Ernie 3.0-Titan, a large language model that Baidu has been developing since 2019.
Baidu says the chatbot will be able to give conversational responses to prompts in English and will primarily focus on trying to understand the nuances of Chinese. Ultimately, it will be integrated into the company’s search engine and Xiaodu voice assistant and used in its AI Cloud and Apollo autonomous driving businesses, Baidu CEO Robin Li said on the company’s 2022 Q4 earning call.
The day Baidu made its announcement, its shares surged 15 percent on the Hong Kong stock exchange.
A week after Baidu’s news, iFlyTech, an AI company known for voice recognition systems, announced its own AI bot. iFlyTech said it will launch the bot in May and is “very confident of achieving a similar technological leap forward as ChatGPT.” On February 27, Tencent announced that it has formed a new team internally to develop its ChatGPT alternative, HunyuanAide. Meanwhile, ecommerce companies Alibaba and JD.com and gaming giant NetEase have all said they’re working on AI chatbots.
Wang Huiwen, cofounder of the food delivery giant Meituan, came out of retirement in February, posting on the social media platform Jike that he was recruiting staff to build an OpenAI competitor. He said he had secured $230 million in venture capital funding, on top of $50 million of his own money, to fund the project.
The Chinese government has also recognized the importance of development in generative AI. A white paper released on February 13 by Beijing’s Municipal Bureau of Economy and Information, which hosts and regulates a large number of Chinese AI startups, promised to assist “top domestic firms in creating competing models to ChatGPT.”
“The frontrunner of the race to build a homegrown ChatGPT in China will be companies that already laid the foundation of building GPT-3-like large models,” says Jeffery Ding, assistant professor of political science at George Washington University, referring to the GPT-3 family of large language models underlying ChatGPT. Baidu, Huawei, Inspur, and Tencent have all been building these models, Ding says, and may not be far behind US companies.
Liu Jun, senior vice president of Inspur Information and general manager of AI, told WIRED that Inspur’s Yuan 1.0 model has 245.7 billion parameters and a 5 TB data set, and now boasts an open source developer community with more than 3,000 members. According to a paper published in 2021 by Baidu, Ernie 3.0 Titan has 260 billion parameters and a 4 TB data set. By comparison, OpenAI’s GPT-3 has around 175 billion parameters.
Huawei, Baidu, and Tencent did not respond to WIRED’s request for comment.
Despite being almost entirely trained in English, ChatGPT has demonstrated the ability to produce reasonably fluent Chinese text, but it does so slowly, with a five-second lag compared to English, according to WIRED’s testing on the free version. Users have pointed out on social media that the text still occasionally sounds like it’s been translated.
This could be because there is still a lot less material for the models to scrape for data, despite the enormous scale of the Chinese internet. “The lack of good quality Chinese text could be a problem,” Ding says, pointing out that there are twice as many Wikipedia entries in English as in Chinese.
The linguistic traits of the language have historically made building natural language programming challenging. Chinese is often more contextual than English and uses more idioms and complex metaphors. However, since 2017 the development of “transformer” neural networks, which are able to learn context from data sets, has helped researchers overcome the problem.
“The high-context nature of Chinese language used to create hurdles in natural language processing,” says Thomas Qitong Cao, a PhD candidate at Stanford University who studies political behaviour and the internet. “But the gap between languages has significantly closed in the era of pretrained large language models.”
Cao says the challenges of training Chinese-language AI models test the size and quality of data sets, as well as computing power.
Companies will also have to contend with the government’s censorship of subjects it considers sensitive. Social media platforms in China already employ a combination of algorithms and human moderators to monitor content and remove anything that breaches the government’s constantly moving rules for what is and isn’t allowed.
Tech companies will need to closely monitor the output of chatbots, a task that will probably involve employing human moderators. “It is likely that we will see this type of human-reliant censorship, in combination with other tactics like keywords blocking, being used in public-facing chatbots,” Cao says.
An investigation by Time found that OpenAI is paying Kenyan workers less than $2 an hour to make ChatGPT less toxic.
However, the nature of chatbots, whose output cannot always be anticipated or controlled by their creators, means it’s inevitable that companies will run into trouble, according to the Carnegie Endowment’s Sheehan.
“[There are] two public AI laws focusing on recommendation algorithms and deepfakes, respectively, which demonstrates that the Chinese government has a top priority monitoring the content people consume online,” Sheehan says. “AI-generated content falls into this category, and it would be expected that the companies who try to create their own ChatGPTs will run into problems with the Cyberspace Administration of China.”
Chinese tech platforms have begun to crack down on black market ChatGPT access. By late February, WIRED found that the keywords “ChatGPT” and “OpenAI” have been banned on Taobao. On WeChat, “ChatGPT Online” and similar services have rebranded to neutral-sounding names like “AI Smart Chat.”
The intermediaries depend on APIs (which offer programmers access to the backend of the ChatGPT system) and on bulk-registered accounts. “These intermediaries profit by relaying ChatGPT’s service to users who do not have direct access. Just in this process alone, the parties involved would have violated ChatGPT’s terms and conditions, and other related trademarks and applicable patents,” says Ivan Wang, a New York-based IP attorney.
Data showing the number of ChatGPT users in China who managed to find workarounds to the restrictions is not available, but the proliferation of under-the-table access points has at least provided some use cases for generative AI.
Echo Liu, a tech product manager, paid 189 RMB ($27.50) for an OpenAI account with ChatGPT Plus, a pilot subscription service that gives users prioritized access. “I am particularly astounded by the ability of ChatGPT to explain complex language in plain language,” she says. Liu upgraded to ChatGPT Plus after experiencing lags in response while talking to ChatGPT in Chinese, and she is now trying to learn coding through it.
A number of small entrepreneurs selling overseas have already integrated ChatGPT into their day-to-day work.
Tao Ye, owner of a global logistics service called OL Warehouse, tells WIRED that his company has already started using ChatGPT in customer queries at a small scale. “We are experimenting with letting ChatGPT write customer service messages, and it has been producing good results,” he says.
Rachel, who runs a small ecommerce site aimed at English-speaking audiences and asked to be identified by her first name only to avoid official scrutiny, says she has used the system to help draft copy. On Chinese lifestyle social media platform RED, Rachel’s post sharing how to integrate ChatGPT in cross-border ecommerce has been liked over 2,000 times. She used to hire a freelance writer based in India on the microtasking site Fiverr to write her blog posts for $20 apiece, but she has now decided to switch to using ChatGPT completely.
“Writing product descriptions and blog posts in proper English used to be a pain for me,” she says. “ChatGPT has now drastically sped up our listing process and communication.”
Source : Wired