MITB Banner

Beware of Chinese Open-Source LLMs 

Not really.

Share

Beware of Chinese Open-Source LLMs 

Illustration by Nikhil Kumar

Listen to this story

If you’ve seen or even heard of popular American comedy series Silicon Valley, you may be familiar with the shady Chinese app developer, Jian-Yang. Yang goes back to China to build a knock-off version of Pied Piper, a fictional cloud-based compression platform which allows users to compress and share their files between devices. 

A similar drama is unfolding at OpenAI, where the company has filed for a patent for “GPT-6” and “GPT-7” in China, not in the US, to avoid the Pied Piper situation, obviously. 

This new development also highlights the advancements in open source AI research in China, which even OpenAI is concerned about.

When it comes to open source AI research, we have often heard many say that it is a risk to open source powerful AI models because Chinese competitors would have all the weights of the models, and would eventually be on top of all the others. Large language models (LLMs) from China are increasingly topping the leaderboards. 

Most recently, DeepSeek, a 67 billion parameter model outperformed Llama 2, Claude-2, and Grok-1 on various metrics. The best part is that the model from China is open sourced, and uses the same architecture as LLaMA. Companies like Abacus AI are ready to host the models on their platforms.

Topping the leaderboards

Given the geopolitical conflict between the US and China, the regulations on chip exports to the country are increasing, making it difficult for it to build AI models, and up its business. But the increasing number of open source models indicates that China does not really rely on US technology to further its AI field. 

For instance, the Open LLM Leaderboard on Hugging Face, which has been criticised several times for its benchmarks and evaluations, currently hosts AI models from China; and they are topping the list. 

Tigerbot-70b-chat-v2 is currently leading the pack. The model, available on GitHub and Hugging Face, is built on top of Llama 2 70b architecture, along with its weight. It seems like open source models such as Llama 2 are actually helping the AI community in China to build models better than the US at the moment. 

Tiger Research, a company that “believes in open innovations”, is a research lab in China under Tigerobo, dedicated to building AI models to make the world and humankind a better place. Similarly, DeepSeek is also a research lab with the mission of “unravelling the mystery of AGI with curiosity”.

Another hint of China’s open source AI dominance is the Yi-34B model released by 01.AI startup, reaching a unicorn status after the release. The AI startup by Kai-Fu Lee is developing AI systems for the Chinese market. The interesting part is that the second and third models on the Open LLM Leaderboard are also models based on Yi-34B, combining them with Llama 2 and Mistral-7B.

Not just this, Alibaba, the Chinese tech giant, also released Qwen-72B with 3 trillion tokens, and a 32K context length. This, along with a smaller Qwen-1.8B, is also available on GitHub and Hugging Face, which requires just 3GB of GPU memory to run, making it amazing for the research community. Notably, Qwen is also an organisation building LLMs and large multimodal models (LMMs), and other AGI-related projects.

Is China open source a threat?

Clearly, the fear of China rising up against US AI models is becoming a reality. The models from the country are increasingly dominating the open source, and will continue to do so in the upcoming year. And regulations are clearly not making it any better for the US. But how big a threat is it really?

The recent slew of releases of open source models from China highlight that the country does not need US assistance in its AI developments. Moreover, if the US continues to crush its open source ecosystem with regulations, China will rise up even more in this aspect. As long as China continues to open source its powerful AI models, there is no threat at the moment. 

Furthermore, China leading in the AI realm is not a new phenomenon. When GPT-3.5 was announced by OpenAI, Baidu released its Ernie 3.0 model, which was almost double the size of the former. The case has been the same ever since Ernie 2.0 was released, to compete with the first version of OpenAI’s GPT in 2019.

Even though these models are on the top of the Open LLM Leaderboard, a lot of researchers have been pointing out that it is just because of the evaluation metrics used for benchmarking. A lot of researchers in China are also hired from the US. 

Moreover, a lot of these models are extremely restrictive. Given the information control in the country, these models might be fast, but are extremely poor when it comes to implementation into real use cases. “Don’t use Chinese models. They are going to nerf them the Chinese way, which is to alter behaviour even worse than current US censored models,” said a user on X.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.