Listen to this story
Democratised AI research team of EleutherAI research collective, ‘CarperAI’ has introduced version 0.2 of OpenELM, an open-source library combining large language models with evolutionary algorithms for code synthesis.
CarperAI has also unveiled a set of differential (diff) models that can predict changes in code. These models have been trained on millions of GitHub commits. The three models, namely diff-codegen-350m, diff-codegen-2b, and diff-codegen-6b, have been fine-tuned from Salesforce’s CodeGen code synthesis models.
In order to create complex code, the models use a description of a change to generate diffs for editing existing code. This can help the model be better at correcting bugs, especially if the commit message is accurate.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
OpenELM is based on the OpenAI’s research paper titled, ‘Evolution through Large Models (ELM)’, which shows how large language models can function as intelligent mutation operators in an evolutionary algorithm, enabling diverse and excellent code output in domains that are not included in the language model’s training set.
Besides the initial features, the latest version includes integration with the triton inference server, which can speed up the inference times of codegen models by ten times. Additionally, it also supports diff models, which allows for code mutation within a loop by presenting a code segment and a commit message that describes the change.
MAP-Elites for generated code—either from a diff model or from prompt engineering an existing language model—the Sodarace 2D environment along with a number of other baseline environments were all included in the initial release of OpenELM (version 1). It also comprises benchmarking of mutation LLMs using a play environment and a sandbox employing gVisor, a Docker container, and Flask to securely run code created by language models.
According to the OpenAI paper, LLMs have performed well in automated code generation when trained on code datasets like OpenAI’s Codex. Evolutionary algorithms, on the other hand, offer a means of generating code by introducing mutations to well-known, or “seed”, programmes in situations when we are interested in a class of programmes that is hardly ever encountered in the training distribution. An LLM trained on code can recommend intelligent mutations for genetic programming (GP) algorithms, as demonstrated by the ELM method. LLMs offer a method of encoding this domain knowledge and directing the genetic algorithm towards intelligent exploration of the search area. Genetic algorithms often need to be substantially customised with domain knowledge to allow them to make desirable changes. The fundamental process is generate, evaluate, and fine-tune. Everything has been put into practise so far, except for the conditional reinforcement learning part.