MITB Banner

A Beginner’s Guide to GPT Neo (With Python Codes)

GPT Neo

Ever thought about writing a code that can code for you?! Or generate contextualised text on the subject you want ?! Well, the solution to all of these use cases were given by OpenAI, which is a large scale organisation considered by many to be leading the world in Artificial Intelligence,   when they introduced the iconic GPT paper which is named ‘Language Models are Few Shot Learners’ ( Generative Pre-trained Transformer) in June 2018.

Afterwards, in upcoming years, OpenAI introduced GPT-2 and GPT-3 as well. 

Generative Pre-trained Transformer or in short GPT is a transformer-based model architecture which is nothing but stacks of encoders and decoders put one after the other, of which has been pre-trained on Wikipedia Corpus (wow seriously ? like everything on Wikipedia ?!) as well as Common Crawl (Fun fact – this has over 12 PetaBytes of data which is 12 years of data uploaded on the internet ) datasets for performing extremely well on language-based use cases. Generative, as the word suggests, is for making our code generate text. Now it can be poems, articles, essays or even code!!

According to VentureBeat,” a private corpus of 500 billion tokens was used for training the model and a computational cost of a staggering 50 million USD”. 

The latest GPT-3 has over 175 BILLION parameters! As said by Hugo Cen from Entreprenuer.com, and I am quoting, “This is the Most Powerful Artificial Intelligence Tool in the World ”, and I am confident most of us believe that too! However, there is one problem that 

GPT-3 is only accessible via a beta API, which is presently on hold and for that, you have to write an application to OpenAI. Crazy right?

What if you want to leverage the power of GPT-3 but don’t want the hassle of going through the application process and so on? Introducing GPT-Neo, an open-source Transformer model with only 2.7 Billion parameters, also notes that the largest GPT Neo is almost equivalent to the smallest GPT-3, which resembles GPT-3 both in terms of design and performance. 

When comparing GPT-Neo with GPT-3 Ada (smaller version of GPT-3), the former did better than the latter on Hellaswag and Piqa. Hellaswag is a benchmark with intelligent multichoice sentence completion that has a context paragraph and four endings. Piqa can measure common sense reasoning where the machine has to pick one out of two sentences that make the most sense. However, GPT-3 Ada is not the biggest as mentioned earlier; it’s big brother GPT-3 Davinci with about 65 times as many params as GPT-Neo, Davinci beat Neo comfortably. Yepp, nothing much unexpected.  

You can train this model from scratch using a mesh-TensorFlow library, a superb library for easy and efficient data and model parallelism to help with distributed support. These models have tons of data to train on and lots of parameters; hence parallelism is vital here. This means that you’ll be running different segments of your training simultaneously rather than doing it one after another. This is completely independent of different batches. Google Research has provided a simple template as well as implementation in this notebook. Ensure to go through the readme file for instructions on how to proceed; code for this notebook is provided below with steps.

  1. Cloning the GitHub Repository of GPT-Neo by Setup cell, make sure you have TPU runtime if not, go to Runtime -> Change Runtime -> TPU. 
  1. Setting up Google Cloud as TPUs cannot read from local systems; hence the below cell will require your authentication credentials if you don’t have a Google Cloud Platform account, no worries! You can make an account for free and get credits worth of 300 USD free for a span of 90 days. Else you can follow the notebook how it goes!

    The below command will take you through the configuration of gcloud.

 from google.colab import auth
 auth.authenticate_user()
 !gcloud init 

Setup a new configuration with any name you like and proceed with your Google Account with which you have logged in GCP. Create a project name and make sure you follow the guidelines as this will cause errors, and you’ll have to run the whole cell again.

  1. You are ready to go when you get confirmation of Google SDK configuration and are ready to use. 

Now we have to set up the datasets (the list is present in the notebook ), tokenize them, and copy it to the bucket (which is a storage for a particular project), which will be made in your GCP.

     # Tokenize Data
 !python data/create_tfrecords.py --input_dir /content/GPTNeo/$dataset_path --name $dataset_name --files_per 1000 --output_dir $out_name --write_dataset_config --processes 1
 # copy the data to your bucket
 if not path_to_cloud_bucket.endswith('/'):
        path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket + "datasets/" + dataset
 !gsutil -m cp -r /content/GPTNeo/$out_name $copy_loc
 !gsutil ls $path_to_cloud_bucket 
  1. Before starting the training, editing to the dataset is required, and model configurations to point to your bucket created in GCP. For this, you have to change the ‘path’ field and change the given dataset’s name to your chosen dataset.
     %%writefile configs/dataset_configs/Sampling_Only.json
 {
   "path":   "gs://eleutherai/datasets/Sampling_Only/Sampling_Only*.tfrecords",
   "eval_path": "",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257
 } 
  1. Setting up the model configurations, for a detailed breakdown, make sure to follow here; this is a GitHub README file provided by EleutherAI, which had made GPT-Neo and open-sourced it.
     %%writefile configs/GPT3_XL.json
 {
     "n_head": 16,
     "n_vocab": 50257,
     "embed_dropout": 0,
     "lr": 0.0002,
     "lr_decay": "cosine",
     "warmup_steps": 3000,
     "beta1": 0.9,
     "beta2": 0.95,
     "epsilon": 1e-8,
     "opt_name": "adam",
     "weight_decay": 0,
     "train_batch_size": 256,
     "attn_dropout": 0,
     "train_steps": 600000,
     "eval_steps": 0,
     "predict_steps": 1,
     "res_dropout": 0,
     "eval_batch_size": 4,
     "predict_batch_size": 1,
     "iterations": 100,
     "n_embd": 2048,
     "datasets": [["pile", null, null, null]],
     "model": "GPT",
     "model_path": "gs://eleutherai/GPT3_XL",
     "n_ctx": 2048,
     "n_layer": 24,
     "scale_by_depth": true,
     "scale_by_in": false,
     "attention_types" :  [[["global", "local"],12]],
     "mesh_shape": "x:4,y:2",
     "layout": "intermediate_expanded:x,heads:x,vocab:n_vocab,memory_length:y,embd:y",
     "activation_function": "gelu",
     "recompute_grad": true,
     "gradient_clipping": 1.0,
     "tokens_per_mb_per_replica": 2048,
     "precision": "bfloat16"
 } 

7. Finally, we can train the model from scratch using the following command.

!python3 main.py --model colab_XL --steps_per_checkpoint 500 --tpu colab

8. Upload the model to your bucket as shown below

 # upload to your bucket
 bucket_base = "gs://" + path_to_cloud_bucket.replace('gs://', '').split('/')[0]
 !gsutil -m cp -r $path_to_local_weights $bucket_base 

9. If everything worked out, you may be able to see your model listed below

!gsutil ls $bucket_base

10. For evaluation, the notebook has used a wikitext dataset and to leverage that 

 wikitext103_src = "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip"
 !wget $wikitext103_src
 !unzip wikitext-103-raw-v1.zip 

11. This step will make a directory, tokenize the text as required and copy it to the bucket.

 !mkdir wikitext
 !mv /content/GPTNeo/wikitext-103-raw/wiki.test.raw wikitext/wikitext_test.txt
 # Tokenize Data
 !python data/create_tfrecords.py --input_dir wikitext --name wikitext --files_per 1000 --output_dir wikitext_tokenized --write_dataset_config --processes 1 --wikitext-detokenize
 # copy the data to your bucket
 if not path_to_cloud_bucket.endswith('/'):
   path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket 
 !gsutil -m cp -r wikitext_tokenized $copy_loc
 !gsutil ls $path_to_cloud_bucket 

12. Repeating step of setting up dataset configuration.

 %%writefile configs/dataset_configs/wikitext.json
 {
   "path": "",
   "eval_path": "gs://test-bucket-neo/wikitext_tokenized/*.tfrecords",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257
 } 

13. Running the model for evaluation over the tokenized text.

!python3 main.py --eval --tpu colab --model $pretrained_model

This was a complete breakdown of all the steps required to train the GPT-Neo model from scratch you need to follow order. This needs high computational power (thanks to TPU, this doesn’t take on forever!!) and needs time to run, but it is an amazing run through for GPT-Neo 

GPT Neo is the name of the codebase for transformer-based language models loosely styled around the GPT architecture. There are two types of GPT Neo provided: 1.3B params and 2.7B params for suitability. In this post, we’ll be discussing how to make use of HuggingFace provided GPT Neo: 2.7B params using a few lines of code.

Let’s dig in the code!

Code Implementation of GPT-Neo

Importing the Dependencies 

Installing PyTorch, the easiest way to do this is to head over to PyTorch.org, select your system requirements, and copy-paste the command prompt. I am using a Windows machine with a Google Colab notebook. Select the stable build, which is 1.8.1 at this point. Then select your Operating System. I prefer using the pip package while in Google colab but one can prefer conda while in Jupyter. It is going to help a lot if you have a GPU; else select CUDA 10.2.

You’ll see the command, and it’s ready to use!

Make sure you have the latest version of PyTorch. This may take a while if you install this for the first time, as it may have to uninstall older versions first then install the newer versions. It highly depends on your internet connectivity.

!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 

torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Installing transformers, we will leverage HuggingFace, and the amazing thing about this is that you have a wide variety of different pipelines for different tasks. Amazing right?! 

I highly recommend exploring the most around transformers on HuggingFace. 

!pip install transformers

Importing pipeline from transformers as we are going to use the text generation pipeline

from transformers import pipeline

Setting up the Generator

Download the GPT Neo model, which has 2.7 Billion parameters which is quite huge. Again, this will take time as the size is around 10 GigaBytes, so make sure you have a good internet connection. But you can also download the GPT Neo small version of only 1.3 billion parameters which is relatively small. 

Instantiate the model using a variable name; text-generation is the name of our pipeline, as mentioned earlier.  

generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B')

Generating Text using Prompt

We have to provide a prompt or topic on which we want the text to be generated.

prompt = "The current stock market"

Output Text

Save the output to a variable named ‘res’. Arguments given to the generator created before are as follows: the name of the prompt, length of the text generated you want, leverage sampling in our model, the value used to model the next set of probabilities.  

 res = generator(prompt, max_length=50, do_sample=True, temperature=0.9)
 Printing the output to a text name as generated_text
 print(res[0]['generated_text']) 

The Output will look like this.

Trying a different prompt, let’s say something like this.

prompt = “import pandas as pd”

Running this will give us something like this. 

As you can see, it has already imported basic libraries used; you can imagine what a level of contextuality this model has reached. Amazing right?! 

Saving to a File

Open a new text file named gpttext.txt for saving our output by using the write method.

 with open('gpttext.txt', 'w') as f:
     f.writelines(res[0]['generated_text']) 

So this was all about how to try the best text model out there and leverage it for different tasks. Try this notebook with different prompts and different arguments. Links will be present here as well as in the notebook. 

NOTE: Make sure you have enough RAM in Google Colab; else, the runtime will crash after downloading the model; hence you can try the smaller version of GPT Neo.

The notebook is provided here with all the code you need for reference.

References:

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Mudit Rustagi

Mudit Rustagi

Mudit is experienced in machine learning and deep learning. He is an undergraduate in Mechatronics and worked as a team lead (ML team) for several Projects. He has a strong interest in doing SOTA ML projects and writing blogs on data science and machine learning.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories