AI is a complex and ever-evolving field where organisations and individuals are constantly focused 0n finding novel solutions to pressing challenges. The year has been full of path-breaking innovations which have pushed the boundaries and made way for better outcomes. In this article, we list the top ten AI innovations of 2021 so far.
Researchers from Facebook AI Research introduced a new Transformer model, Unified Transformer (UniT). UniT has an encoder-decoder architecture that handles multiple tasks and domains in a single model with fewer parameters; as per Facebook’s team, UniT is a step towards general intelligence.
OpenAI’s DALL.E & CLIP
DALL. E is OpenAI’s 12-billion parameter version. It is a transformer that can generate images from text prompts. The model can work with multiple objects in an image to either render an image or alter it based on text prompts.
The OpenAI research team has also demonstrated a neural network called Contrastive Language-Image Pre-training or CLIP. This neural network has been trained on 400 million pairs of images and text. CLIP is also similar to GPT family and can learn to perform tasks such as object character recognition (OCR), geo-localisation, action recognition, etc.
Blender Bot 2
Facebook’s BlenderBot 2 is a first of its kind open-source chatbot with long term memory. Facebook has been working to make the AI more empathetic, knowledgeable and capable. The BlenderBot 2.0 can build long term memory for continuous access. It does so while simultaneously searching for information on the internet and holding conversations on nearly any topic.
Google’s Translatotron 2
In 2019, Google released Translatotron, an end-to-end speech-to-speech translation model. It was then the first end-to-end framework which could translate speech from one language into speech to another, directly.
The system was used to create synthesised translations of voices to ensure the sound of the original speaker is intact. But this feature had the potential to be misused to generate speech in a different voice and create deep fake voices.
This year, Google released Translatotron 2, an updated version where the trained model is restricted to retain the source speaker’s voice. Unlike the previous version, it cannot generate speech in different voices, thereby mitigating potential misuse for creating spoofing audio artefacts.
Google introduced Vertex AI, a managed machine learning platform for deploying and maintaining AI models, at this year’s Google I/O conference. The new platform brings AutoML and AI Platform together into a unified API, client library and user interface.
Earlier, researchers would be required to run millions of test images for training algorithms, but now, they can rely on Vertex technology stack to do the heavy lifting.
Microsoft’s FLAML is a python package that can tell us the best-fit ML model for low computation. It helps eliminate the manual process of choosing the best model and best parameter.
This AutoML system is mainly focused on–model selection, hyperparameter tuning, feature engineering, neural architecture search, and model compression.
MusicBERT is Microsoft’s Large Scale Pre-Trained Model For Symbolic Music Understanding. It covers applications such as emotion classification, genre classification, and music piece matching. Microsoft has created this model using an OctupleMIDI method, bar-level masking strategy, along with a large scale symbolic music corpus containing more than 1 million music tracks.
Microsoft’s neural TTS
Microsoft’s neural text to speech software (TTS) enables developers to create custom synthetic voices. The AI is structured in three layers: text analyser, neural acoustic model, and neural vocoder.
The text analyser converts plain text to pronunciations, the acoustic model converts pronunciations to acoustic features and finally, the vocoder generates waveforms.
Google’s TensorFlow 3D is a highly modular library to bring 3D deep learning capabilities to TensorFlow. While the previous TensorFlow was not enough to understand the environment, the 3D update provides a set of operations, loss function, data processing tools, metrics, and other models for developing, training, and deploying state-of-art 3D scene understanding models.