Advances in machine learning and deep learning research are reshaping our technology. Machine learning and deep learning have accomplished various astounding feats this year in 2021, and key research articles have resulted in technical advances used by billions of people. The research in this sector is advancing at a breakneck pace and assisting you to keep up. Here is a collection of the most important recent scientific study papers.
Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training
The authors of this work examined why ACGAN training becomes unstable as the number of classes in the dataset grows. The researchers revealed that the unstable training occurs due to a gradient explosion problem caused by the unboundedness of the input feature vectors and the classifier’s poor classification capabilities during the early training stage. The researchers presented the Data-to-Data Cross-Entropy loss (D2D-CE) and the Rebooted Auxiliary Classifier Generative Adversarial Network to alleviate the instability and reinforce ACGAN (ReACGAN). Additionally, extensive tests of ReACGAN demonstrate that it is resistant to hyperparameter selection and is compatible with a variety of architectures and differentiable augmentations.
This article is ranked #1 on CIFAR-10 for Conditional Image Generation.
For the research paper, read here.
For code, see here.
Dense Unsupervised Learning for Video Segmentation
The authors presented a straightforward and computationally fast unsupervised strategy for learning dense spacetime representations from unlabeled films in this study. The approach demonstrates rapid convergence of training and a high degree of data efficiency. Furthermore, the researchers obtain VOS accuracy superior to previous results despite employing a fraction of the previously necessary training data. The researchers acknowledge that the research findings may be utilised maliciously, such as for unlawful surveillance, and that they are excited to investigate how this skill might be used to better learn a broader spectrum of invariances by exploiting larger temporal windows in movies with complex (ego-)motion, which is more prone to disocclusions.
This study is ranked #1 on DAVIS 2017 for Unsupervised Video Object Segmentation (val).
For the research paper, read here.
For code, see here.
Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases
The authors offer an atlas-based technique for producing unsupervised temporally consistent surface reconstructions by requiring a point on the canonical shape representation to translate to metrically consistent 3D locations on the reconstructed surfaces. Finally, the researchers envisage a plethora of potential applications for the method. For example, by substituting an image-based loss for the Chamfer distance, one may apply the method to RGB video sequences, which the researchers feel will spur development in video-based 3D reconstruction.
This article is ranked #1 on ANIM in the category of Surface Reconstruction.
For the research paper, read here.
For code, see here.
EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow
The researchers propose a revolutionary interactive architecture called EdgeFlow that uses user interaction data without resorting to post-processing or iterative optimisation. The suggested technique achieves state-of-the-art performance on common benchmarks due to its coarse-to-fine network design. Additionally, the researchers create an effective interactive segmentation tool that enables the user to improve the segmentation result through flexible options incrementally.
This paper is ranked #1 on Interactive Segmentation on PASCAL VOC
For the research paper, read here.
For code, see here.
Learning Transferable Visual Models From Natural Language Supervision
The authors of this work examined whether it is possible to transfer the success of task-agnostic web-scale pre-training in natural language processing to another domain. The findings indicate that adopting this formula resulted in the emergence of similar behaviours in the field of computer vision, and the authors examine the social ramifications of this line of research. CLIP models learn to accomplish a range of tasks during pre-training to optimise their training objective. Using natural language prompting, CLIP can then use this task learning to enable zero-shot transfer to many existing datasets. When applied at a large scale, this technique can compete with task-specific supervised models, while there is still much space for improvement.
This research is ranked #1 on Zero-Shot Transfer Image Classification on SUN
For the research paper, read here.
For code, see here.
CoAtNet: Marrying Convolution and Attention for All Data Sizes
The researchers in this article conduct a thorough examination of the features of convolutions and transformers, resulting in a principled approach for combining them into a new family of models dubbed CoAtNet. Extensive experiments demonstrate that CoAtNet combines the advantages of ConvNets and Transformers, achieving state-of-the-art performance across a range of data sizes and compute budgets. Take note that this article is currently concentrating on ImageNet classification for model construction. However, the researchers believe their approach is relevant to a broader range of applications, such as object detection and semantic segmentation.
This paper is ranked #1 on Image Classification on ImageNet (using extra training data).
For the research paper, read here.
For code, see here.
SwinIR: Image Restoration Using Swin Transformer
The authors of this article suggest the SwinIR image restoration model, which is based on the Swin Transformer. The model comprises three modules: shallow feature extraction, deep feature extraction, and human-recognition reconstruction. For deep feature extraction, the researchers employ a stack of residual Swin Transformer blocks (RSTB), each formed of Swin Transformer layers, a convolution layer, and a residual connection.
This research article is ranked #1 on Image Super-Resolution on Manga109 – 4x upscaling.
For the research paper, read here.
For code, see here.