Google had announced a series of exciting innovations from their research shop last week. During the same time, Microsoft and Facebook had some interesting updates from their AI team. Here are the top releases from last week:
Microsoft Open-Source State-Of-The-Art Object Tracking AI
Researchers from Microsoft, along with a team from Huazhong University, have open-sourced FairMOT, an AI object detector for Multi-Object Tracking. This detector is believed to have outperformed current state-of-the-art models on public data sets. Object detection has a lot of direct applications in the current world. Even during the current COVID-19 crisis, facial recognition technology was used to identify patients. Apart from this, the researchers believe that there are plenty of other uses for their application, including security and elderly care.
You can input a raw video and get the demo video by running src/demo.py and get the mp4 format of the demo video:
python demo.py mot --load_model ../models/all_dla34.pth
Check more here.
TensorFlow releases API to train smaller, faster AI models
TensorFlow has released Quantization Aware Training (QAT) API to allow developers to train and deploy models with the benefits of quantization. Quantization is the process of mapping input values from a large set to output values in a smaller set — while retaining accuracy.
This new API is designed to support the development of smaller and faster machine learning models well-suited to run on off-the-shelf machines where computation resources are expensive.
Google’s SimCLR Framework For Self Supervision
Turing award winner Yann Lecun firmly believes that the future belongs to self-supervision machine learning models. However, the current self-supervised techniques are not widely adopted due to complexities in image data.
So the team at Google AI have proposed a framework called SimCLR that outperforms the state-of-the-art models on self-supervised and semi-supervised learning and it does this with a limited amount of class-labelled data on the ImageNet dataset. The researchers believe that SimClr can be incorporated into existing supervised learning pipelines because of its simplicity.
Facebook Introduces RegNet
Researchers from Facebook AI recently introduced a new network design paradigm known as RegNet. RegNet – or Regular Networks – is a low-dimensional design space that consists of simple, regular networks.
The intuition behind RegNet parametrization is that a quantized linear function can explain the widths and depths of good networks. The researchers have analyzed the design space of RegNet and concluded that their network provides simple and fast networks that work well across a wide range of flop regimes. According to the experimental results, RegNet models are claimed to have outperformed the popular EfficientNet models while being up to five times faster on GPUs.
Tool To Sense 3D On Pixel 4
Depth sensing is an integral part of many latest innovations, ranging from augmented reality to fundamental sensing innovations such as transparent object detection.
For instance, Google uses it on their flagship phones. The front of the Pixel 4 contains a real-time infrared (IR) active stereo depth sensor called uDepth. The technology behind uDepth deployment helps provide secured face locks against spoof attacks.
Google AI has now provided access to uDepth as an API on using the Pixel Neural Core, two IR cameras, and an infrared pattern projector to provide time-synchronized depth frames at 30Hz. Users can leverage Google Camera App and experience depth in 2D photos.
Google Open Sources Universal Sound Separation Dataset
Have you ever tried to use Google applications using speech? There might have been instances where your ‘Ok Google’ did not go OK. One primary reason behind this can be the device’s inability to distinguish between sounds. The models on phones separate sound, but does it in a low-quality way. For example, sounds are usually classified as “speech” versus “non-speech”.
That said, the biggest challenge of training these models is the annotation. Even if one has high-quality audio recordings, it is not that straightforward to label the recordings with ground truth. To achieve good results, you need a diverse set of sounds, a realistic room simulator, and code to mix these elements. So Google open-sourced FUSS, a universal sound separation dataset that contains realistic, multi-source, multi-class audio with ground truth.