The speed of execution of any code is highly dependable on the computing chip and traditionally performance models are used to run the code through simulation of the chip architecture to test their speed.
Generally, compilers and developers use the performance models to run the code through a simulation of the given chip.
The compiler uses the information from this simulation to automatically optimise the code, and then the developer’s job is to find the possible solutions to the problems in the microprocessor that runs the given code.
However, these models are created by a small group of experts and aren’t appropriately validated. So, the simulated performance measurements that are obtained from these performance models deviate drastically from the real-life results.
So, MIT researchers have come up with a machine learning tool to predict the speed of computer chips to execute code from various applications. Let’s take a brief look at how Ithemal works, a neural network model described by MIT.
Predicting the number of clock cycles taken by the processor to execute a block of assembly instructions in steady-state is known as the throughput.
With modern computer machines, it is challenging to build a performance model to estimate this throughput. The problem with building this analytical model for the sophisticated machines is that it is a tedious process to build, it is error-prone and must be built from scratch for each processor generation of the computing.
MIT’s Ithemal (Instruction Throughput Estimator using Machine Learning) is the first machine learning tool that has been developed to learn to predict the throughput so that one doesn’t go through the tedious process of building performance model to test the throughput of the chip.
Throughput is something which predicts how fast the instructions process the data. Accurate throughput prediction of basic blocks (sequences of instructions with no basic jumps) is vital in many systems such as register allocating and instruction scheduling.
The recent advancements have given birth to Ithemal, that treats throughput as a regression task and uses DNN. It learns to predict throughput by mapping assembly sequences to real-world throughput and a large corpus of labelled data. To be more precise, Ithemal uses a hierarchical multiscale RNN, which is responsible for generating an independent embedding for each instruction and then combine the instructions embeddings to predict throughput accurately.
The end-to-end model is divided into three stages:
The canonicalisation stage converts the assembly input dictated by the syntax of the assembly instructions into a more structured form. Before mapping the compiled assembly to the list of instructions, Ithemal first dissembles the compiled assembly block.
All the instruction codes contain a list of tokens. These tokens represent the operation code (opcode; ad), source operands and destination operands, separated by unique delimiter tokens.
Embedding is a representation of instruction as a real-valued vector in a high-dimensional space. Ithemal produces an embedding for each instruction. The embedding stage creates embeddings from a canonicalised token stream of instructions.
The first step is to map a given token to an embedding, this first step is called the token layer. Then comes the instruction layer where Ithemal maps the sequence of token embeddings for each instruction in the basic block. The instructional layer is implemented using a sequential Recurrent Neural Network (RNN) architecture with Long Short Term Memory (LSTM) because the size of the input to the embedding stage is variable because an instruction might have a variable number of tokens depending upon the number of source and destination operands.
The estimation comes from the prediction layer, where a sequence of instruction embeddings (the basic block) is mapped to a throughput value using the hierarchical RNN and LSTM. Using the final value from LSTM, Ithemal predicts the basic block’s throughput and produces a final real-valued number representing the model’s or the neural network’s throughput prediction.
Ithemal was evaluated by the researchers against two state-of-the-art handwritten models — IACA (Intel 2017) and Ilvm-mca (LLVM). Researchers show that Ithemal beats the accuracy of these sophisticated handwritten models which were designed to model the complexities of modern processors. Ithemal, unlike other models, doesn’t compromise on speed in order to be accurate, it beats these models with the same speed they work.
According to researchers that Ithemal has less than 50% of the errors of the two state-of-the-art analytical models and can be easily ported across a variety of processor microarchitectures which solves the problem of building the model from scratch for each processor generation.
BHive is an open-source dataset which was created to validate Ithemal. Researchers have created a suite containing around 3,00,000 basic blocks and put them in BHive. These basic blocks were from different fields like cryptography, compilers, ML and graphics.
BHive is a researcher’s benchmark tool which can give an analysis of the model’s strengths and weaknesses with different workloads in detail.
When the testing was done on ithemal, it showed that Ithemal was able to predict how fast processors could accurately run a code compared to other models in consideration. BHive was tested on Ithemal, IACA, Ilvm-mca, and OSACA.
The researchers said that this tool makes it easier to quickly learn the performance speed of any new chip architecture that has been introduced.
The researchers have made the process of making a model for predicting performance easier.
“If you want to train a model on some new architecture, you collect more data from that architecture, run it through our profiler, use that information to train Ithemal. And, now you have a model that predicts performance,” said Charith Mendis, one of the author on the paper.
The next step for the researchers will be to study the methods of this Ithemal to make the model more interpretable because much of the machine learning is a black box, and it’s never clear how the machine learning model arrives at the output it gives.