The Mother of LLMs

Alibaba's Qwen is topping all the charts and giving tough competition to other open source models.

Share

Illustration by Nikhil Kumar

Published on March 11, 2024

by Mohit Pandey

Listen to this story

Like the legendary Mother of Dragons from Game of Thrones, Daenerys Targaryen, Qwen rises from the much-ignored Chinese AI models to become a force to reckon with. Born from the visionary minds at the tech giant Alibaba, Qwen embodies the spirit of liberation and empowerment.

Abacus AI recently open-sourced the best uncensored models online to attest to the claim. Liberated-Qwen1.5-72B, as touted by Bindu Reddy, the company’s founder, is the most performant uncensored model in the world. As the name reveals, the model is built on top of Qwen1.5, an open-source model built by Alibaba.

Qwen1.5-72B model has surpassed Claude 2.1 and GPT-3.5-Turbo-0613 on several benchmarks since its launch. Notably, Qwen (abbr. Tongyi Qianwen) is the organisation building LLMs, large multimodal models (LMMs), and other AGI-related projects. It also develops vision and audio language models based on the Qwen language model.

Reddy said, “Qwen-liberated inches out of the best open-source model on the HumanEval leaderboard, which is Qwen1.5 chat.” An even more fascinating part about Qwen1.5 is that it is just a beta version of the much-awaited Qwen2, which Alibaba said is expected to arrive soon.

Topping the charts

If there is a true testament to an LLM’s capability, it is the Hugging Face Open LLM Leaderboard. Currently, the top model on the charts is also from Abacus AI. Smaug-72B, developed by fine-tuning the Korean AI company Moreh’s LLM called MoMo, performs on top of every other model, either built on top of Mistral or Meta’s Llama models.

MoMo 72B is built on top of another open-source model from Qwen, the Qwen72B. Developed by Alibaba, it is an open-source model with 3 trillion tokens and a 32K context length that comes in different sizes. Ever since it was released in December last year, it has been outperforming every model globally.

Moreover, Qwen has also reached Qwen-Agent, a framework built for developing LLM applications using the capabilities of Qwen1.5. The features include Function Calling, Code Interpreter, RAG, and a Chrome Extension.

Junghwan Lim, head of the AI group at Moreh, told AIM that Chinese open-source models have been performing well on several benchmarks, and the companies are continually improving them with each release. “I admire their efforts in providing various artefacts based on their new model, including Int4 and Int8 quantisation,” said Lim.

The company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community. It has a 2k context length and requires only 3GB of GPU memory. Both models would be available on Alibaba Cloud for its customers and as open source.

“The amusing part for me is their effort on multilingual understanding and human preference alignment,” Lim added. “These days, when demand for local LLM is growing rapidly, opening the base model and all the smaller models has made great contributions to the community, and the countries just started the ‘AI race’,” he added.

The ‘Qwenching’ effect

As mentioned earlier, Qwen models were launched to save Alibaba’s cloud business, which had been crumbling since October last year. Though the models are currently available on Alibaba Cloud, their capabilities are highlighted with appreciation from the open-source community.

“Instead, we will focus on developing a sustainable growth model based on emerging AI-driven demand for networked and highly scaled cloud computing services,” Joe Tsai, CEO of Alibaba, said in the company’s investor call in November.

Speaking of Meta, the company plans to release Llama 3 as soon as possible. It is the right time to adopt the impressive open-source LLMs from China, and Qwen has proved its prowess. Meanwhile, competition within China is also increasing in the open-source community, with models such as DeepSeek and Yi-34B ranking at the top of the charts.

But for now, Qwen has secured its spot on the top.

The Winter is Coming

Google has Gemma. Microsoft has Phi-2 and Orca. Meanwhile, Amazon remains tightlipped about making smaller models. Though it uses LLMs from other companies such as Meta and Anthropic, relying on them for a longer time would not be a thoughtful strategy for the company to compete against the tech giants – most importantly Qwen1.5.

Reminiscent of the ominous warning from Game of Thrones, “Winter is Coming” aptly describes the ascent of the Chinese Qwen model in its challenge against the established titans Microsoft, Meta, Google, and Amazon. Much like how Emad Mostaque, the founder of Stability AI, said in November that Chinese open models will overtake GPT-4 shortly – he was talking about Qwen.

Access all our open Survey & Awards Nomination forms in one place