Listen to this story
|
In June this year, AMD CEO Lisa Su had announced the company’s plans to launch Instinct MI300X, an alternative to NVIDIA’s H100, soon. The timing couldn’t have been better, given the GPU shortage in the world. Amidst the tussle between NVIDIA and Intel, AMD is working quietly and coming up ahead in the AI race. And currently, the focus is on software as the plans for hardware are already on track.
More recently, AMD, along with Korean telecom KT, decided to back AI software developer Moreh. In a series-B fund, the Santa Clara-based startup raised $22 million, bringing the total it raised to $30 million. Right now, the company is focused on developing ROCm, a software alternative to CUDA, which is NVIDIA’s moat in AI.
Recently, Vamsi Boppana, senior VP of AI at AMD, said that Radeon Open Compute platform (ROCm) is the company’s number 1 priority at the moment. “We have much larger resources actually working on software, and Su has been very clear that she wants to see significant and continued investments on the software side,” he said.
When it comes to Moreh, the company’s flagship product MoAI, is compatible with PyTorch, TensorFlow, and other applications that earlier were exclusively running on NVIDIA hardware. With AMD’s investment, the company will further enhance and accelerate AMD’s race in the software realm of AI.
KT has been working with Moreh since 2021 and powering its scalable AI infrastructure on AMD GPUs, coupled with MoAI software. Currently, the MoAI platform primarily supports AMD’s ROCm. KT uses AMD Instinct MI250 accelerator with MoAI, which it claims is 116% faster than NVIDIA’s A100.
Software is king
Brad McCredie, corporate vice president of data center GPU and accelerated processing at AMD, said in a statement: “The AI software ecosystem supporting AMD AI hardware continues to grow, providing choice for data scientists and other users of AI as they build the AI models and solutions that will drive the continued growth of this industry.”
Companies such as Lamini, have been eagerly waiting for the launch of MI300X with 192GB of HBM, which will allow its models to run even better. Lamini says that AMD’s ROCm is production ready already and claims that it “has enormous potential to accelerate AI advancement to a similar or even greater degree than CUDA for LLM finetuning and beyond.”
Apart from Moreh, AMD has made a huge leap with its recent acquisition of Nod.ai, an open source AI software firm. “The acquisition is expected to significantly enhance our ability to provide AI customers with open software that allows them to easily deploy highly performant AI models tuned for AMD hardware,” said Boppana.
The software bet has been going on at AMD for some time now. In August, the company also announced the acquisition of Mipsology, a French AI startup, which has also been a long-standing AMD partner and developing AI software for the chipmaker, similar to Nod.ai.
Boppana then wrote, “The team will help develop our full AI software stack, expanding our open ecosystem of software tools, libraries, and models to pave the way for streamlined deployment of AI models running on AMD hardware.”
Clearly AMD has found the software route.
ROCm is bracing for MI300X
Boppana said at the AI Hardware Summit in September that AMD is getting an enormous customer pull at the moment and it is dictating a lot of the company’s tactics at the moment. “The plane is flying right now, so we cannot disassemble the engine. However, we are absolutely doing things at the foundational level to make more unification happen in our stack,” he said in an interview.
He also said that MI300 samples are already there with a lot of customers and they are testing its capabilities with ROCm. For example, MosaicML, the startup that was acquired by DataBricks has also been experimenting with AMD’s hardware since the beginning of this year, (but that is MI250, not MI300), sharing its systems with NVIDIA’s.
El Capitan, the upcoming exascale supercomputer in Lawrence Livermore National Laboratory is also hosting an unannounced number of MI300s already, building on the hype of the release.
Boppana further highlighted that the performance of ROCm is going to be crucial for the success of its upcoming hardware. Despite some companies and developers embracing it, ROCm is still in the early stages of development, and “being candid, we have a few places to grow”.
So while CUDA might be the king at the moment, AMD is definitely not sitting ducks anymore with ROCm, and is driven to take over NVIDIA in the race.