Thinking Beyond Generative AI, One Token At A Time

VCs have been banking on generative AI companies, but what is their real moat?
Listen to this story

Every company today claims to be using GPT-3-like models for their generative AI play. This includes Jasper, Notion, Regie, Frase and others. As a result, there are questions about what differentiates these ‘all-in-one’ companies apart from design, marketing, and use cases. But, as Chris Frantz, co-founder of Loops, iterates, this also leads one to believe “there is almost no moat in generative AI”.

To understand this better, let’s look at the recent updates from GPT-3 creator OpenAI, where it launched the third version of the Davinci model. This new update calls for more computing resources leading to higher costs per API call and lesser speed than other models.

This also means that the companies need to look beyond using generative AI tools, and focus on enhancing their computing capabilities, particularly in terms of cost as well as optimisation. This explains why Jasper—the AI-content platform— recently announced that it would partner with the American AI startup Cerebras Systems. The company will use Cerebras’ Andromeda AI supercomputer to train GPT networks, creating outputs of varying levels of end-user complexity. Additionally, the AI supercomputer is also said to improve the contextual accuracy of the generative model while providing personalised content across different users. 


Sign up for your weekly dose of what's up in emerging technology.

Regarding the partnership, venture capitalist Nathan Benaich says it looks like Jasper may move forward to decrease its reliance on OpenAI’s API by building its own models and training them on Cerebras, going beyond training GPT-3 on Cerebras systems. 

The two AI platforms—Jasper and Notion—have taken different approaches to AI integration. While Jasper is using the AI-accelerating computing power of Cerebras, Notion is supported by Google Cloud, which will use the Cloud TPU for training the API. Although Notion has not confirmed it yet, it is widely believed that the kind of output it generates suggests that it is using OpenAI API’s GPT-3.  

Download our Mobile App

Therefore, in the era of GPT-3 companies, Jasper will look to set a new benchmark for what can be the moat in generative AI companies. The API used and the means taken for training the model will be the defining factor separating the companies. This directly supports that the present and future of software are cloud services and supercomputing services. It also emphasises on the effective use of hardware for generative AI play.

Read: India’s Answer to Moore’s Law Death

The following are some of the approaches that can help you understand hardware side of things when leveraging generative AI tools:

CS-2 versus-Cloud-versus GPU

The Andromeda AI supercomputer is built by linking 16 Cerebras CS-2 systems powered by the largest AI chip, the Wafer Scale Engine (WSE) 2. Cerebras’ ‘weight streaming’ technology provides immense flexibility, allowing for independent scaling of the model size and training speed. In addition, the cluster of CS-2 machines has training and inference acceleration that can support even trillion parameter models. Cerebras also claims that their CS-2 machines can form a cluster of up to 192 systems with near-linear performance scaling to speed up training. 

Further, a single CS-2 system can clock a compute performance of tens to hundreds of graphics processing units (GPU) and deliver output that would normally take days and weeks on general-purpose processors to generate in a fraction of the time. 

In contrast, the Cloud uses custom silicon chips to accelerate AI workloads. For example, Google Cloud employs its in-house chip, the Tensor Processing Unit (TPU), to train large, complex neural networks using Google’s own TensorFlow software. 

Cloud TPUs are ‘virtual machines’ that offload networking processors onto the hardware. The model parameters are kept in on-chip, high-bandwidth memory. The TensorFlow server fetches input training data and pre-processes it before streaming it into an ‘infeed’ queue on the Cloud TPU hardware. 

Additionally, Cloud has also been increasing its GPU offerings. For instance, the latest AWS P4d and G4 instances are powered by NVIDIA A100 Tensor Core GPUs. Earlier this year, Microsoft Azure also announced the adoption of NVIDIA’s Quantum-2 to power next-generation HPC needs. The cloud instances are widely used as they come fully configured for deep learning with accelerated libraries like CUDA, cuDNN, TensorFlow, and other well-known deep learning frameworks pre-installed. 

Andrew Feldman, CEO and co-founder of Cerebras Systems, explained that the variable latency between large numbers of GPUs in traditional cloud providers creates difficult, time-consuming problems when distributing a large AI model among GPUs, and there are “large swings in time to train.”

According to ZDNET, the ‘pay-per-model’ AI cloud services of Cerebras’ system are $2,500 for training a GPT-3 model with 1.3 billion parameters in 10 hours to $2.5 million for training one with 70 billion parameters in 85 days, costing on average half of what customers would pay to rent cloud capacity or lease machines for years to do the task. 

The same CS-2 clusters are also eight times faster to train than the training clusters of NVIDIA A100 machines in the Cloud. Whereas, according to MLPerf, when similar batches are run on TPUs and GPUs with the same number of chips, they almost exhibit the same training performance in SSD and Transformer benchmarks. 

But, as Mahmoud Khairy points out in his blog, the performance depends on various metrics beyond the cost and training speed, and, hence, the answer to which approach is best also depends on the kind of computation that needs to be done. At the same time, the Cerebras CS-2 system is emerging as one of the most powerful tools in training vast neural networks. 

Read: This Large Language Model Predicts COVID Variants

The AI supercomputing service provider is also extending itself to Cloud by partnering with Cirrascale cloud services to democratise cloud services and give its users the ability to train the GPT model at much cheaper costs than existing cloud providers and with only a few lines of code.

More Great AIM Stories

Ayush Jain
Ayush is interested in knowing how technology shapes and defines our culture, and our understanding of the world. He believes in exploring reality at the intersections of technology and art, science, and politics.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

A beginner’s guide to image processing using NumPy

Since images can also be considered as made up of arrays, we can use NumPy for performing different image processing tasks as well from scratch. In this article, we will learn about the image processing tasks that can be performed only using NumPy.

RIP Google Stadia: What went wrong?

Google has “deprioritised” the Stadia game streaming platform and wants to offer its Stadia technology to select partners in a new service called “Google Stream”.