Top Machine Learning Research Papers Released In 2021

Advances in the machine and deep learning in 2021 could lead to new technologies utilised by billions of people worldwide.

Advances in machine learning and deep learning research are reshaping our technology. Machine learning and deep learning have accomplished various astounding feats this year in 2021, and key research articles have resulted in technical advances used by billions of people. The research in this sector is advancing at a breakneck pace and assisting you to keep up. Here is a collection of the most important recent scientific study papers.

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

The authors of this work examined why ACGAN training becomes unstable as the number of classes in the dataset grows. The researchers revealed that the unstable training occurs due to a gradient explosion problem caused by the unboundedness of the input feature vectors and the classifier’s poor classification capabilities during the early training stage. The researchers presented the Data-to-Data Cross-Entropy loss (D2D-CE) and the Rebooted Auxiliary Classifier Generative Adversarial Network to alleviate the instability and reinforce ACGAN (ReACGAN). Additionally, extensive tests of ReACGAN demonstrate that it is resistant to hyperparameter selection and is compatible with a variety of architectures and differentiable augmentations.

This article is ranked #1 on CIFAR-10 for Conditional Image Generation.

For the research paper, read here.

For code, see here.

Dense Unsupervised Learning for Video Segmentation

The authors presented a straightforward and computationally fast unsupervised strategy for learning dense spacetime representations from unlabeled films in this study. The approach demonstrates rapid convergence of training and a high degree of data efficiency. Furthermore, the researchers obtain VOS accuracy superior to previous results despite employing a fraction of the previously necessary training data. The researchers acknowledge that the research findings may be utilised maliciously, such as for unlawful surveillance, and that they are excited to investigate how this skill might be used to better learn a broader spectrum of invariances by exploiting larger temporal windows in movies with complex (ego-)motion, which is more prone to disocclusions.

This study is ranked #1 on DAVIS 2017 for Unsupervised Video Object Segmentation (val).

For the research paper, read here.

For code, see here.

Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

The authors offer an atlas-based technique for producing unsupervised temporally consistent surface reconstructions by requiring a point on the canonical shape representation to translate to metrically consistent 3D locations on the reconstructed surfaces. Finally, the researchers envisage a plethora of potential applications for the method. For example, by substituting an image-based loss for the Chamfer distance, one may apply the method to RGB video sequences, which the researchers feel will spur development in video-based 3D reconstruction.

This article is ranked #1 on ANIM in the category of Surface Reconstruction. 

For the research paper, read here.

For code, see here.

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

The researchers propose a revolutionary interactive architecture called EdgeFlow that uses user interaction data without resorting to post-processing or iterative optimisation. The suggested technique achieves state-of-the-art performance on common benchmarks due to its coarse-to-fine network design. Additionally, the researchers create an effective interactive segmentation tool that enables the user to improve the segmentation result through flexible options incrementally.

This paper is ranked #1 on Interactive Segmentation on PASCAL VOC

For the research paper, read here.

For code, see here.

Learning Transferable Visual Models From Natural Language Supervision

The authors of this work examined whether it is possible to transfer the success of task-agnostic web-scale pre-training in natural language processing to another domain. The findings indicate that adopting this formula resulted in the emergence of similar behaviours in the field of computer vision, and the authors examine the social ramifications of this line of research. CLIP models learn to accomplish a range of tasks during pre-training to optimise their training objective. Using natural language prompting, CLIP can then use this task learning to enable zero-shot transfer to many existing datasets. When applied at a large scale, this technique can compete with task-specific supervised models, while there is still much space for improvement.

This research is ranked #1 on Zero-Shot Transfer Image Classification on SUN

For the research paper, read here.

For code, see here.

CoAtNet: Marrying Convolution and Attention for All Data Sizes

The researchers in this article conduct a thorough examination of the features of convolutions and transformers, resulting in a principled approach for combining them into a new family of models dubbed CoAtNet. Extensive experiments demonstrate that CoAtNet combines the advantages of ConvNets and Transformers, achieving state-of-the-art performance across a range of data sizes and compute budgets. Take note that this article is currently concentrating on ImageNet classification for model construction. However, the researchers believe their approach is relevant to a broader range of applications, such as object detection and semantic segmentation.

This paper is ranked #1 on Image Classification on ImageNet (using extra training data).

For the research paper, read here.

For code, see here.

SwinIR: Image Restoration Using Swin Transformer

The authors of this article suggest the SwinIR image restoration model, which is based on the Swin Transformer. The model comprises three modules: shallow feature extraction, deep feature extraction, and human-recognition reconstruction. For deep feature extraction, the researchers employ a stack of residual Swin Transformer blocks (RSTB), each formed of Swin Transformer layers, a convolution layer, and a residual connection.

This research article is ranked #1 on Image Super-Resolution on Manga109 – 4x upscaling.

For the research paper, read here.

For code, see here.

Download our Mobile App

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox