AMD and Lamini Come to the Rescue of AI Startups

“If you are an AI startup blocked on GPUs, send me a note,” said Gregory Diamos, co-founder of Lamini.

Share

Illustration by Raghavendra Rao

Published on December 20, 2023

by Mohit Pandey

Believe it or not, there is still a shortage of NVIDIA GPUs in the market. AI startups, along with Fortune 500 companies, have been struggling to onboard GPUs for training AI models, even as small as Llama 7B. This is where Lamini with its partnership with AMD has found its moat — LLM fine tuning on AMD using ROCm.

“If you are an AI startup blocked on GPUs, send me a note,” posted Gregory Diamos, co-founder of Lamini, on LinkedIn. “At Lamini, we have figured out how to use AMD GPUs, which gives us a relatively large supply compared to the rest of the market.”

Similarly, co-founder Sharon Zhou posted on X, “We just brought online a relatively enormous supply of GPUs Lamini. We’re allocating a significant subset to promising startups & open-source initiatives, building the future of LLMs.”

Lamini co-founders claimed that they have the most competitive perf/$ on the market right now, “because we figured out how to use AMD GPUs to get software parity with CUDA, trending beyond CUDA.”

Zhou, at AMD’s Advancing AI event, also discussed how they have been leveraging AMD hardware and software all this while, and proving that the open nature of the technology has been helping them fully own the technology. “We have reached beyond CUDA,” said Zhou. This was after the big reveal two months ago that Lamini was exclusively running on AMD GPUs for the past year.

What exactly has Lamini figured out?

In September, Lamini opened up its LLM Superstation for both, cloud and on-premise for its customers. The LLM Superstation is a finely tuned supercomputer that incorporates 128 AMD Instinct GPUs, utilising Lamini on the AMD ROCm open software ecosystem.

Read: AMD’s ROCm is Ready To Challenge NVIDIA’s CUDA

Lamini and ROCm have achieved significant maturity, facilitating the effective fine-tuning of expansive LLMs, including Meta AI’s Llama 2, which now Lamini customers can book for training LLMs, and making their AI models proprietary.

To ensure this, Lamini incorporates sophisticated optimisations tailored for enterprise LLMs, leveraging and extending PEFT (LoRA), RLHF, and toolformer. These optimisations enable data isolation across 4,266x models on a single server, accelerate model switching by 1.09 billion times, compress models by a factor of 32x, and seamlessly integrate LLMs with enterprise APIs without the need for hyperparameter search.

Furthermore, Lamini has integrated a range of novel optimisations designed to expedite LLMs while harnessing the distinctive capabilities of AMD’s MI platform. These enhancements empower the hosting of 200 billion parameter models on a singular server, accommodating 10,000 fine-tuned language models on a single server, managing 12,800 concurrent requests to a solitary server, and efficiently processing more than 3.5 million queries per day on a single node.

“What’s more, with Lamini, you can stop worrying about the 52-week lead time for NVIDIA H100s,” reads the Lamini blog. “Using Lamini exclusively, you can build your own enterprise LLMs and ship them into production on AMD Instinct GPUs.” The company claims the cost of using Lamini on AMD Instinct GPUs is 10 times lesser than AWS, without the wait time.

“We’ve deployed Lamini in our internal Kubernetes cluster with AMD Instinct GPUs, and are using fine tuning to create models that are trained on AMD code base across multiple components for specific developer tasks,” said Vamsi Boppana, SVP of AI at AMD.

Moreover, Diamos claims that MI250X runs bigger models than NVIDIA’s A100s. “We chose the Instinct MI250 as the foundation for Lamini because it runs the biggest models that our customers demand and integrates fine tuning optimisations. We use the large HBM capacity (128GB) on MI250 to run bigger models with lower software complexity than clusters of A100s.”

Companies love AMD GPUs

“Building LLMs should be easy. Every enterprise should be able to own LLM IP, just like they do for all their other software. We’re excited to partner with AMD because their GPUs unlock a huge opportunity for enterprises to get started with little to no lead time,” said Zhou. “The main reason for this is compute strategy.”

Read: AMD Loves Llama So Much

Apart from Lamini, and through Lamini, many companies are increasingly testing out AMD’s products. Now that MI300X is also out, companies such as Microsoft, Meta, and Oracle have already announced the integration of the GPUs into their data centres and providing their customers AMD compute. OpenAI is also integrating support for ROCm on its Triton compiler.

Databricks and Essential AI had also announced the use of MI250X for their services and said at the Advancing AI event that they are ready for MI300X.

All this is because of AMD’s performance and its open source approach. The training performance of the MI300X is exactly equal to the NVIDIA H100. But when it comes to inference, MI300X using Bloom 176B and Llama 2 70B offers 1.6X and 1.4X faster performance.

Zoho’s Sridhar Vembu also revealed at the Global AI Conclave that given the shortage of GPU supplies, the company has also been running its models on AMD. “AMD has a very competitive silicon now, so with that I think the supply situation should resolve then it’s a matter of getting the talent capital.”

Access all our open Survey & Awards Nomination forms in one place