MITB Banner

ChatGPT is Ruining Our Environment, But There’s a Way to Tackle It

Staying true to its name, FlexGen can be flexibly configured despite and even under constraints on hardware resources by summing up compute from GPUs, CPUs and disks

Share

Listen to this story

As the ChatGPT hype peaked, so did the sense of reckoning around the carbon footprint that the tool may leave behind. Media reports quoted wild estimates (24.86 metric tonnes per capita of CO₂ emissions per day) around how much energy LLMs like GPT 3.5 (on which ChatGPT is built) have drained. 

This is not to say that these worries are fanciful or shouldn’t be discussed. But it is also true that research around optimising GPU compute and other efforts to reduce compute driving LLMs have come a long way from headlines which described training GPT-3 as energy-consuming as taking a trip to the moon. 

Running a 175-billion model on 1 GPU

A couple of days back, a Stanford AI Lab research student, Ying Sheng; Lianmin Zheng from UC Berkeley; and a couple of other AI researchers released FlexGen, a generation engine for running huge LLMs like GPT-3 with very limited GPU memory. But to what extent can the compute be shrunk? The researchers were able to successfully run Meta AI’s free LLM, OPT-175B model on a single 16GB GPU by the end as demonstrated in a paper titled, ‘High-Throughput Generative Inference of Large Language Models with a Single GPU’.

Staying true to its name, FlexGen can be flexibly configured despite and even under constraints on hardware resources by summing up compute from GPUs, CPUs and disks. FlexGen’s main contribution is that it is able to build more efficient offloading systems for models achieving up to 100 times higher throughput as compared to other state-of-the-art offloading systems like Hugging Face Accelerate. The researchers were able to do this with a new algorithm made for efficient batch-wise offloaded inference. 

Lighter, distilled and pretrained models

One only needs to look hard enough to find similar ways that experts are looking at to take it easy on GPUs. ML researcher, Sebastian Raschka, discussed a paper released earlier this month titled, ‘MarioGPT: Open-Ended Text2Level Generation through Large Language Models’ authored by Shyam Sudhakaran and Miguel González-Duque, among others. The model is able to generate tile-based game levels (like the paper works with Super Mario Bros levels) from text. Falling into the space of light, fun and creative generative AI models, the tool was built on a distilled GPT-3 model that can be trained on a single GPU. 

This idea of distillation implemented in this case has gained popularity in the past couple of years. It means to essentially reduce the predictions of widely-known, big, complex models into smaller models. For instance, DistillBERT, the smaller version of Google’s BERT has 40% fewer parameters and runs 60% faster while achieving 95% of BERT’s performance-level. 

Most models released by companies now are also pretrained. This effectively cuts off a major chunk of the compute that pretraining again and again eats into. 

Some AI platforms are constantly working on methods to solve the scaling problem of these models. Cohere AI, a Canadian startup founded by former Google Brain employees Aidan Gomez and Nick Frosst, partnered with Google last year to build a platform for powerful LLMs which didn’t need the infrastructure or high-level expertise that such undertakings usually demand. In a paper titled, ‘Scalable Training of Language Models using JAX pjit and TPUv4’, the engineers described how their new FAX framework deployed on Google Cloud’s TPU v4 Pods could successfully train larger models quicker and also deliver the prototypes to customers faster. 

Data Parallelism solutions

Last year in December, Amazon SageMaker launched a new technique of training called sharded data parallelism which performs a number of optimisations including higher speed by 39.7%. Concepts like data parallelism are a knight in shining armour for AI experts because it kills several birds with one stone – reduces training time and cost while being less energy intensive and accelerating the time-to-market period. 

Stability AI, the startup founded by Emad Mostaque which released Stable Diffusion, the family of AI text-to-image models and gained instant fame, has partnered with SageMaker. A blog post making the announcement stated that Stability AI had used this technique for their foundation models. 

All this is to say that there’s a tonne of work being done under the radar in the hardware department that isn’t necessarily visible or hasn’t reached a tool like ChatGPT yet. But it’s safe to say that even as generative tools like these reach levels of accessibility that were unheard of before in AI, research will continue to find ways to taper the cost of power and consequently the environment. 

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.