Hands-On to CompilerGym: A Reinforcement Learning Toolkit for Compiler Optimizations

Compiler Gym

CompilerGym is a python toolkit by Facebook. It is a reinforcement learning package for compiler optimization problems. This framework’s motivation is that compilers’ decisions are very risky performance-wise and have to be efficient for the required software. Applying AI to optimize the compiler is growing these days. However, due to the compiler’s dynamic nature, it is not easy for it to do experiments. The key idea is to allow AI researchers to experiment with compiler optimization methods without really getting into the compilers’ details and help compiler developers look into new AI optimization problems. 

Vision of CompilerGym

The vision is to ease program optimization, without even writing a single line of code in C/C++. The goals of CompilerGym are mentioned below:

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
  1. To create “ImageNet for Compilers” : an open source OpenAI gym environment for experimenting with compiler optimizations using real-world software and datasets.
  2. Provide a common platform for comparisons of different techniques of compiler optimization.
  3. Give control to users for all the decisions that the compiler makes.(see Roadmap for details).
  4. Making CompilerGym ready for easy deployment, creating the production environment more efficient.

Key Concepts in CompilerGym

CompilerGym uses compiler optimization problems as an environment for reinforcement learning. It uses OpenAI Gym interface to the agent-environment loop mentioned below:

Source : https://facebookresearch.github.io/CompilerGym/


  • Environment: defines a compiler optimization task. The environment contains an instance of the compiler and the program which is being compiled. Whenever the agent – environment interaction happens, the compiler’s current state can change.
  • Action Space: it defines the set of actions that are taken on the current environment state.
  • Observation: it defines the view of the current environment state.
  • Reward: it defines a metric that indicates the performance of the previous action.

A single instance of the “agent-environment loop” represents the compilation of a particular program to develop an agent that can maximize the cumulative reward from these environments.


CompilerGym can be installed via PyPI.

!pip install compiler_gym

For installation from other sources, check the official tutorial.

Structure of CompilerGym Code

 import gym
 import compiler_gym                     # imports the CompilerGym environments
 env = gym.make("llvm-autophase-ic-v0")  # starts a new environment
 env.require_dataset("npb-v0")           # downloads a set of programs
 env.reset()                             # starts a new compilation session with a random program
 env.render()                            # prints the IR of the program
 env.step(env.action_space.sample())     # applies a random optimization, updates state/reward/actions 

Demo – CompilerGym Basics

  1. Import the compiler_gym  library. Importing compiler_gym automatically registers the compiler environment.
 import gym
 import compiler_gym 

    Its version can be checked by:


We can check all the available environments in CompilerGym with the help of the command mentioned below:


  1. Select an environment.

CompilerGym environment are named as one of the following formats:

  • <compiler>-<observation>-<reward>-<version>
  • <compiler>-<reward>-<version>
  • <compiler>-<version>


  • <compiler> is the compiler optimization task
  • <observation> is the default observation provided and,
  • <reward> is the reward signal 

Check compiler_gym.views for more details. For the example purpose, the following environment will be used:

You can create an instance of this environment by using the code below:

env = gym.make("llvm-autophase-ic-v0")

  1. Installing Benchmarks

In CompilerGym, the input programs are known as benchmarks and a collection of benchmarks are contained into datasets. You can use a pre-defined benchmark or create your own.

The benchmarks(if available to the present environment) can be queried using env.benchmarks. Available benchmarks can be seen as:


It will return an empty list if there are no benchmarks available. You can also use predefined programs. For the example purpose, we will use the NAS Parallel Benchmarks dataset :

  1. The compiler environment
  • The CompilerGym environment is very similar to the OpenAI Gym environment. You can check the documentation of any method via help() function. For example:


  • Action Space : CompilerGym defines the action space by env.action_space. You can check the codes here.
  • Observation Space: The observation space is described by env.observation_space.
  • The upper and lower bounds of the reward signal are described by env.reward_range.
  • Before using the other CompilerGym environment we should call env.reset() to reset the environment state.
  1. Interaction with the environment: It is the same as the interaction with the OpenAI Gym environment.

To print the Intermediate Representation (IR) of the program in the current state, we use : 


env.step() runs an action. It returns four values: a new observation, a reward, a boolean value to know whether the episode has ended and additional information.

observation, reward, done, info = env.step(0)

An example of optimization is shown below where rewards indicated the reduction in size of the code as compared to the previous action.A cumulative reward greater than one means that the sequence of optimizations performed yields better results than LLVM’s default optimizations. Let’s run 100 random actions and see how close we can get:

 episode_reward = 0
 for i in range(1, 101):
     observation, reward, done, info = env.step(env.action_space.sample())
     if done:
     episode_reward += reward
     print(f"Step {i}, quality={episode_reward:.2%}") 

The above code can be run with a simple command line command.


You can also save the program for future purposes, the code is given below:

 !file /tmp/program.bc 

At the end, never forget to close the environment to end that particular compiler instance.



In this article, we have covered CompilerGym, a reinforcement learning toolkit for optimizing the compiler. The basic code structure and code usage are mentioned above in different sections. Colab Notebook for Demo is available at:

Official Codes & Docs are available at:

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox