MITB Banner

Guide to Salesforce’s CTRL: Conditional Transformer Language Model

Researchers from Salesforce have released a new powerful generative language model that reaches a new milestone in generative language models’ history!

Share

salesforce CTRL

Researchers from Salesforce have released a new powerful generative language model that reaches a new milestone in generative language models’ history! CTRL is the abbreviation for Conditional Transformer Language Model incorporating a new feature called control codes to generalise the model for a very wide range of text-generation applications. Control codes govern the nature of generated text by various attributes such as style, content, task-specific behaviour, topics, domains, dates, entities and relationships between entities. 

Present text-generation language models are usually trained conditionally in a task-oriented fashion ceasing their generalisation ability. With initial text prompts, these models are generating texts in the field of their training. Those models capture the pattern as word vectors or contextualized word vectors. A model trained on a specific task can be used in a new task by performing transfer learning and subsequently fine-tuning. However, the need for a generalized model that can be employed in any task has grown in recent time. Salesforce’s CTRL fulfils that need by introducing a new concept of control codes that are fed during text prompt before text generation. Control codes are learnt from the structure of the raw training texts. They yield a control measure on deciding the generation field or area of interest. With CTRL, human users can generate text in their field of interest, just with a few control codes representing their field of interest.

The CTRL is a large-scale model with 1.63 billion parameters being the largest language model to date. It has been trained on 140 GB of text data from various resources such as Wikipedia, Project Gutenberg, Amazon reviews and Reddit. Sources also include a large collection of news data, Europarl and UN data from WMT, question-answer pairs from ELI5 and MRQA shared task, NewsQA, TriviaQA, SearchQA and HotpotQA. The CTRL was originally implemented in TensorFlow on top of the original Transformer architecture with a vocabulary size of 250,000 tokens. The training took place in the cloud distributed across 256 cores of a Cloud TPU v3 Pod for 800,000 iterations continuously over a period of 2 weeks!

How do control codes work in CTRL?

According to the CTRL’s developers, the idea of control codes evolved from the generative models in computer vision tasks that have fine control over generation. 

In the text generation process of present state-of-the-arts, the following token/sequence is generated based on the highest values among the probability distribution of all possible tokens/sequences. This probability distribution is calculated based on the chain rule of probability by calculating the contribution made by previous tokens/sequences. 

p(x) denotes the probability of generating sequence x; i  denotes the current token under prediction; and n represents the sequence’s length.

It can be understood that the probability calculation purely depends on the previously prompted or predicted words or tokens or sentences. 

In the CTRL conditional model, control codes have a firm control over the prediction of next tokens/sequences by conditioning them.

Here, c denotes the control codes, other notations being the same as above.

Whether human-written prompts or model-generated tokens, the CTRL necessitates control code in the prescribed format to make inference. Even for identical prompts, different control codes let the model generate different texts. Even for no prompts, the model produces output based on the control codes provided to it. More control codes can be combined to attain a fine-grained control over generation.

Most control codes in the CTRL model specify the overall style of generated text by denoting a specific domain of training data. Additional control is enabled by providing additional codes to the domain code. A URL can also be used as an additional control code by specifying the term ‘Links’ as domain code. URLs can be used to specify various features, including domain, subdomain, entities, entity relations, and dates. 

Complex tasks such as translation and question-answering can be initiated by simply providing combinations of complex control codes such as source-target languages or questions along with domain and other task-specific attributes.

Control codes enable the model to generate texts in a specific language in a specific domain, though there is no related training data available in that language. Thus control codes give versatility and robustness to the CTRL model in handling tough text tasks by performing zero-shot code-mixing.

Python Implementation of CTRL

Step-1: Enable GPU 

Inference on the pre-trained CTRL model needs GPU. Make sure about the availability of GPU using the following command.

!nvidia-smi

Output:

Step-2: Clone pre-trained CTRL Model

Download the pre-trained model, dependencies and necessary files to the local machine/ cloud environment.

!git clone https://github.com/salesforce/ctrl

Output:

Change the directory to proceed with the downloaded contents.

%cd ctrl/

Step-3: Enable low-memory inference

Generation of text may consume a lot of memory. Official source code repository has a separate branch named lower_memory that enables low memory consumption during inference. Checkout the branch lower_memory to perform inference.

!git checkout lower_memory

Output:

Step-4: Install TensorFlow-based dependencies

The CTRL needs tensorflow [GPU], fastBPE for text embedding and gsutil for data collection. The following commands install those packages.

 %%bash
 pip2 install tensorflow-gpu==1.14
 patch -b /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py estimator.patch
 pip2 install fastBPE
 pip2 install gsutil 

Step-5: Restore specific checkpoint of CTRL for inference

A few number of trained checkpoints of model are available for restoration and thus inference. The following command restores one of the officially declared checkpoints from the cloud to our local model.

!gsutil -m cp -r gs://sf-ctrl/seqlen256_v1.ckpt .

Output:

Step-6: Sample generation of text

By running generation.py file on the CTRL model, we can sample a text generation process. It should be noted that the minimum requirement for inference is CUDA enabled GPU or above. The following command generates data for one of the control code prompts based on a link. The model extracts domain, subdomain and other necessary attributes from the keywords available in the link itself. As the model starts generating text, it prints progressively until the end.

!python2 generation.py --model seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001

Output:

CTRL

Step-7: Sample generation of text with print_once flag

The CTRL enables inference with print_once flag. It prints the whole text only after the generation ends. In this example, the control code is ‘Books’ and the prompt is ‘Books Weary with toil, I haste me to my bed’. 

!python2 generation.py --model seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 --print_once

Output:

CTRL

Wrapping up

The CTRL, the Conditional Transformer Language Model is trained with control codes so that human users can easily perform text generation, machine translation and other related natural language tasks. The CTRL is the largest publicly available language generative model to date with 1.63 billion parameters. The control codes allow users to specify domains, subdomains and any applicable attributes to arrive at fine-grained text generations of greater quality. Future improvements can be implemented in the CTRL model by improving the control codes.

Further reading

Share
Picture of Rajkumar Lakshmanamoorthy

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.