Listen to this story
The oft-quoted Hemingway adage “gradually, then suddenly” is fitting for all progress in machine learning. Most significant breakthroughs in AI research only appear important in hindsight. For instance, reinforcement learning and convolutional neural networks, which were developed conceptually in the 1960s and 1980s respectively, entered the mainstream much later. The compute and data that made these ideas usable are related to more recent innovations in modern hardware. There is but one important exception to this rule that was out of the ordinary the minute it was introduced—the concept of attention in neural networks. A bunch of Google researchers published a paper with the cryptic headline, ‘Attention is All You Need’ in 2017.
An exception to a standard
The paper demonstrated that a transformer neural network used the “self-attention” technique to translate between English and French with more accuracy and used only a quarter of the training time than normally used by other neural nets. Transformers could look at all the elements that were a part of a sequence, mostly words, and could pay closer attention to them. Soon enough, the application of transformer architectures were found in most language tasks in AI/ML, belying their newness. From question answering to grammar correction—most benchmark tasks in natural language processing were covered by transformers. The rise of the transformer was following the same trajectory as the usage in convolutional neural networks which grew after the ImageNet competition held in 2012.
The longstanding impact of Transformers
To gauge the impact of transformer architectures in brief is tough considering their ubiquitous applications. Most influential large language models today like GPT, BERT, GPT-2 and GPT-3, are all transformer models. Transformers aren’t limited to working with words. In fact, they can essentially be used to predict and analyse sequential data of all kinds. For example, the team of researchers at DeepMind who published the research on AlphaFold used a new transformer technique to generate predictions of how amino acid sequences fold into the 3D shapes of proteins. Owing to their accuracy, transformers can also be easily used in anomaly detection in a variety of industries like healthcare and finance.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
There is another peculiarity that sets the paper and its authors apart. Of the eight authors who wrote the paper, six have gone on to form AI- and crypto-related startups. What’s more—the startups have collectively raised more than USD 1 billion in venture capital money.
- Illia Polosukhin became the co-founder of NEAR Protocol, a layer-1 blockchain that offers secure infrastructure to support the development of scalable decentralised applications, or dApps, on Web3. The startup uses a fresh validation process called ‘Sharding’. The NEAR network, which runs on the Proof-of-Stake consensus mechanism, breaks down the network into fragments called ‘Shards’. Each shard is then assigned to a separate transaction to process. This makes the network less burdened and more user friendly.
- Aidan Gomez went on to become the co-founder of Cohere.ai, a startup that helps create easily deployable large language models. Gomez, along with Google alum Sara Hooker, started Cohere in 2019 to rival the monopoly that big tech companies have held over AI research. In June 2022, the startup announced the launch of a non-profit research lab called ‘Cohere For AI’.
- Jakob Uszkoreit, who started with Google Research as an intern went on to become the co-founder of ‘Inceptive’ in July 2021. A biotechnology startup, Inceptive intends to design RNA molecules using deep learning and make breakthrough mRNA medicines more accessible. An entirely new class of drugs, Messenger RNA meds are based on ribonucleic acid (RNA) and are still quite a novelty in medicine.
- Niki Parmar and Ashish Vaswani went on to become co-founders of Adept AI Labs. Vaswani currently serves as the Chief Scientist while Parmar is the CTO of the AI research and product startup. This year in April, Adept announced that it had launched from stealth and raised USD 65 million in Series A funding led by VC firm Greylock Partners and Addition. Adept is steeped in prestigious names. David Luan, the former director of Google Research serves as the CEO of the startup. Adept is also backed by Root Ventures and angel investors like the founder of Behance, Scott Belsky, the founder of Airtable, Howie Liu, data analysis specialist from Stanford, Chris Re and former head of Tesla Autopilot, Andrej Karpathy. Reid Hoffman from Greylock stated in a LinkedIn post that Adept was choosing the road not usually taken towards AGI. “We’re training a neural network to use every software tool in the world, building on the vast amount of existing capabilities that people have already created,” he stated.
- Noam Shazeer went on to co-found and head AI startup ‘Character.AI’ very recently in November 2021. Until then, Shazeer had worked on prestige projects with Google—he helped build the dialog system for LaMDA. The LaMDA project was led by Daniel De Freitas who also eventually became a co-founder at Character AI.