Listen to this story
|
Last week, the focus was on Gemini. However, this week, everyone is talking about Mistral AI, a Paris-based AI startup that raised over $113 million in June, even without a tangible product. The buzz around Gemini couldn’t sustain for even a week when Mistral AI captured the spotlight with the release of its latest model, Mixtral 8x7B. This model is a combination of the Sparse Mixture of Experts (SMoE) with open weights, and it has been shared through a magnet link on X.
Cannot Ignore Mistral AI
Mistral AI’s latest model, 8X7B, based on the MoE architecture, is comparable to other popular models such as GPT 3.5 and Llama 2 70B. Licensed under Apache 2.0, Mixtral surpasses Llama 2 70B on most benchmarks with 6x faster inference.
Mistral AI brands itself as the ‘Mixtral of Experts.’ That’s clever marketing right, considering that OpenAI has been doing the same thing for training GPT-4 since last year. However, somehow with Mistral AI’s latest model, it suddenly has gained popularity.
Mixture of Experts enable models to be pre-trained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model.
It is a decoder-only model where the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively.
This method enhances the model’s parameter count while managing computational expenses and processing time. Specifically, Mixtral boasts a total of 46.7 billion parameters; however, it effectively utilises only 12.9 billion parameters for each token. As a result, it processes input and produces output with comparable speed and cost efficiency to that of a 12.9 billion-parameter model.
However, OpenAI Scientist Andrej Karpathy said “8x7B” name is a bit misleading because it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why the total number of params is not 56B but only 46.7B.”
Mistral AI Masters Business
The Paris-based startup is on a roll, and also announced to secure $415 million in funding with a valuation of $2 billion. Andreessen Horowitz (a16z) spearheaded the latest funding round, accompanied by a renewed investment from Lightspeed Venture Partners.
Open-Source LLM firms often find it difficult to sustain their business. To overcome this, Mistral AI recently introduced ‘La Plateforme’ where it will provide API endpoints for its available models.
The company has created three categories for its models- Mistral Tiny, Mistral Small and Mistral Medium. Mistral 7B Instruct v0.2 and Mixtral 8x7B, comes under Mistral Tiny and Mistral Small respectively. Interestingly, the Medium model is yet to be released.
Mistral AI has stated that it is currently developing Mistral Medium, positioned among the top-serviced models based on standard benchmarks. Proficient in English, French, Italian, German, Spanish, and code, it achieves a score of 8.6 on MT-Bench.On paper, it even beats GPT 3.5.
Interestingly, Mistral opted to launch a paid end-point and refrained from open-sourcing their medium model, which exhibits superior metrics. Introducing hosted API endpoints serves as the most effective method to swiftly gather customer feedback, iterate on real-world use cases, and, crucially, monetize open-source models.
On the contrary, Stability AI is currently struggling to generate sufficient revenue for survival. As a response, the company has introduced Stability AI Memberships, charging developers a fee to use its LLMs for commercial purposes.
Meta has always been a torchbearer for the open-source community, consistently publishing research papers and releasing models. However, one thing Meta doesn’t necessarily need to prioritise is generating revenue, as it already earns significantly from advertising through its family of social media apps.
The startups which are foraying into creating open-source models just cannot keep creating them without monetising it. As Mistral AI has raised a substantial amount, the investors might be hoping for a return on their investment.
Mistral AI the next OpenAI?
Europe recently reached a preliminary agreement on important rules for using AI in the European Union. Surprisingly, Mistral AI wasn’t in favour of endorsing the EU AI Act. The company might have felt that it will hinder its progress in the near future, potentially requiring the disclosure of trade secrets. As a result, they, along with other open-source companies, were exempted from it.
Mistral AI may not continue to release its upcoming models as open source, which is a speculation. This is considering that OpenAI, too, started out as an open-source company. Interestingly, a few months back, OpenAI lobbied the EU to weaken the much-talked-about European Union (EU) AI Act to reduce the regulatory burden on the company.
Karpathy pointed out that the same thing and said that “Glad they refer to it as “open weights” release instead of “open source”, which would imo (in my opinion), require the training code, dataset and docs”
Currently, there are not many AI startups from Europe which have seriously challenged OpenAI and Google. Though Mistral AI is making generative AI fun with top notch marketing and good products, it has announced that it is here to stay.