Listen to this story
“In my youth, I would’ve argued that life is just a series of random events, devoid of any meaning. But as a data scientist, I must recognise that patterns sometimes emerge.” When Gilfoyle, one of the main characters on the popular sitcom Silicon Valley said this, he could have as well extended this to patterns that emerge in the AI innovation space.
It is an undeniable fact that whenever a new, popular and eye-grabbing tool comes to the market, tech companies rush to replicate them and create their own renditions. This gives birth to a certain trend – a pattern. In recent years, three domains of AI innovation have seen heightened interest – language models, code generation tools, and art generation systems.
Large Language model
Being a social media giant with a user base of over 3.5 billion, Meta (earlier Facebook) heavily leverages NLP technology. Its tech team develops and deploys advanced NLP systems to understand and communicate with users and offer, in the company’s own words, “a safe experience—no matter what language they speak”.
Speaking of NLP-related innovations, Meta has introduced several initiatives.
In May, Meta introduced the Open Pretrained Transformer (OPT-175B) – a language model trained on publicly available datasets. What made it different from other language models was that it was released along with pretrained models and the code required to train and use them. The Zuckerberg-owned company followed it up with the release of its 66 billion parameter model.
OPT-175B joins the list of other large language models from Meta. Last year, Meta used the Generative Spoken Language Model (GSLM). Unlike other language models, GSLM is a textless NLP model which uses raw audio signals as input. According to the company, GSLM overcomes the challenges of text-based language models, which is the requirement of large text datasets.
One of the most important pieces of research in the field of NLP from Meta came in the form of RoBERTa, an optimised method for pretraining NLP systems. This tool gives state-of-the-art results on General Language Understanding Evaluation (GLUE) – a widely used NLP benchmark.
RoBERTa is based on Google’s BERT model. When introduced in 2018, the BERT model truly revolutionised the large language model space. It offered state-of-the-art results in the machine learning community, especially in performing a range of NLP tasks. One of the biggest accomplishments of this model was not only in terms of its massive size (340 million parameters) but also in applying the bidirectional training of Transformer, a popular attention model to language modelling.
One of the watershed moments of language learning came with the introduction of the GPT-3 model. A 175 billion parameter model was unheard of when introduced by OpenAI in 2020. There have been several bigger and better models since then. In 2021, Google introduced the Switch Transformer model, which was trained on a staggering 1 trillion parameters. Other important large models include Deepmind’s Gopher and Chinchilla with 280 billion and 70 billion parameters; Microsoft-NVIDIA’s Megatron-Turing NLG model with 530 billion parameters; Google’s GLaM (1.2 trillion) and LaMDA (137 billion) models.
Recently, Google’s LaMDA model was in the news when a (now former) Google employee Blake Lemoine claimed the AI has become sentient. Lemoine was soon put on a break and eventually fired from the company.
OpenAI developed Codex, an AI tool that translates natural language to code; it can interpret simple commands in natural language and execute them on the users’ behalf. Based on Codex, OpenAI, in collaboration with Microsoft and GitHub, introduced Copilot in 2021. OpenAI calls it an AI pair programmer that helps write better code. The Copilot tool draws context from the code being worked on and suggests whole lines or entire functions.
Soon after, Salesforce open-sourced a machine learning system called CodeT5 that can understand and generate code in real-time. As per the team, CodeT5 could achieve state-of-the-art performance for tasks like code defect detection, predicting whether the code is vulnerable to exploits, clone detection, and detecting snippets of code which may have the same functionality.
Earlier this year, DeepMind introduced AlphaCode, a code generator that uses a transformer-based language model to output lines of codes at an ‘unprecedented scale’. It displays skills like language understanding and problem-solving ability. When tested against human programmers on the popular competitive programming platform Codeforces, AlphaCode averaged a ranking of 54.3% across ten contests.
Another famous code generation tool was from the researchers at Carnegie Mellon University – Frank Xu, Uri Alon, Graham Neubig, and Vincent Hellendoorn. Called PolyCoder, it is a model based on GPT-2 (trained on the database of 249 GB of code in 12 programming languages).
Other code generation tools from major tech companies are Facebook’s TransCoder, Intel’s ControlFlag, and a new feature in Microsoft’s Power Apps.
AI art generation
AI-based art generation tools marked the AI scene in the year’s first half. It began with the launch of DALL.E 2 by OpenAI. This image generation tool creates realistic images from a natural language text description provided by the user. It can combine concepts, styles, and attributes. It can also add and remove elements while taking shadows, reflections, and textures into consideration. OpenAI recently made the beta version of this tool available to the general public.
DALL.E 2 is the successor of DALL.E, introduced by OpenAI at the beginning of 2021. The name DALL.E is actually a portmanteau of Salvador Dali and the robot from Wall-E. It is a neural network that is trained on 250 million pairs of images and texts collected from the internet. Along with the introduction of DALL.E, OpenAI also launched the Contrastive Language–Image Pretraining (CLIP) model that builds on zero-shot transfer, natural language supervision, and multimodal learning. The model learns visual concepts from natural language supervision; it can be applied to any visual classification benchmark.
Circling back to DALL.E 2, the kind of rage that it created not only in the AI research community but also in the general public was unprecedented. Soon after, Google introduced Imagen. It is a text-to-image diffusion model which offers superior levels of photorealism and language understanding.
Recently, Meta too introduced an AI-based art generation tool called Make-A-Scene. It is a multimodal generative AI method to generate images corresponding to the textual prompt provided by the user.
Other major and popular AI art generation tools include HuggingFace’s Craiyon (formerly DALL.E Mini) and Midjourney from the Midjourney Lab.
With the introduction of several art-generating tools in just the last few months, it is easy to identify it to be the flavour of the AI season. But anyone who has closely followed the field would tell you that this may not last very long. The AI community will move to better and shinier pastures. As long as the pasture is developing, no one is really complaining!