Not even a decade old, OpenAI has made a name for itself as a leading AI research lab worldwide. It gave the world GPT-3 in 2020 – a path-breaking innovation that uses deep learning to give us human-like text. GPT-3 has been a stepping stone for other tech giants to take inspiration and bring out their own innovations in the large language model space.
This year, too, saw OpenAI testing its limits and continuing its streak in bringing out algorithms and models that can create a massive impact. Let us look at a few of them as the year comes to an end.
Codex
OpenAI released Codex through an API in private beta. It translates natural language to code and is the backbone behind GitHub Copilot. It can interpret simple commands in natural language and execute them on the user’s behalf—making it possible to build a natural language interface to existing applications. OpenAI said, “OpenAI Codex has much of the natural language understanding of GPT-3, but it produces working code.” One can issue commands in English to any piece of software with an API. OpenAI Codex is a general-purpose programming model (can be applied to any programming task).
For more details, click here.
DALL·E
At the beginning of the year, OpenAI came out with DALL·E, a 12-billion parameter version of GPT-3, trained to generate images from text descriptions by using a dataset of text-image pairs. DALL·E is a Transformer language model that receives the text and the image as a single stream of data, containing up to 1280 tokens. It is trained using maximum likelihood to generate all of the tokens, one after another. DALL·E can render an image from scratch and also alter aspects of an image using text prompts. OpenAI said that DALL·E can create plausible images for a range of sentences that explore the compositional structure of language.
For more details, click here.
GLIDE
GLIDE (Guided Language to Image Diffusion for Generation and Editing) is a 3.5 billion parameter text-to-image generation model that is even better than DALL-E. The paper released by OpenAI said the researchers found that samples from the model they generated with classifier-free guidance are both photorealistic and reflect a diverse range of world knowledge. In terms of performance, OpenAI said that samples they generated were preferred to those from DALL-E 87% of the time when evaluated for photorealism and 69% of the time when evaluated for caption similarity by human judges.
For more details, click here.
Triton 1.0
OpenAI released the open-source Python-like programming language Triton 1.0 that helps researchers with no CUDA (Compute Unified Device Architecture) experience write highly efficient GPU code. OpenAI claimed, “Triton makes it possible to reach peak hardware performance with relatively little effort.” It said that Triton has already been used to produce kernels that are up to 2x more efficient than equivalent Torch implementations. Modern GPUs have three crucial components in terms of architecture – DRAM, SRAM and ALUs. OpenAI said that Triton aims to automate these optimisations fully. This will lead to developers focusing more on writing high-level logic of their parallel code.
For more details, click here.
CLIP
While releasing DALL·E, OpenAI also released CLIP (Contrastive Language–Image Pre-training) that builds on a large body of work on zero-shot transfer, natural language supervision, and multimodal learning. OpenAI showed that scaling a simple pre-training task is sufficient to achieve competitive zero-shot performance on a wide range of image classification datasets. This method uses available sources of supervision – the text paired with images found on the internet.
For more details, click here.