MITB Banner

Top 10 Papers Presented At CVPR 2021

Share

At the annual virtual computer vision event CVPR 2021, students, academics and researchers from across the globe, came together to celebrate the advancements in the field of artificial intelligence, machine learning and computer vision.

In this year’s CVPR event, close to 7,093 papers were submitted. Out of this, 7,039 were assigned to reviewers, while 4,312 papers were rejected, 1,047 were withdrawn, and 19 were desk rejected. In total, only about 1,660 papers made it to the poster and oral presentation (acceptance rate: 0.236).

Further, most of the authors (of submitted papers) came from China (8,203), followed by the US (4,628), Korea (1062), UK (655), Germany (574), Canada (517), Australia (462), and India (429).

Michael Niemeyer’s work ‘GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields’ won the best paper award at CVPR 2021. ‘Task Programming: Learning Data-Efficient Behavior Representations,’ co-authored by researchers at Caltech and Northwestern University, won the best student paper award. 

Best paper honourable mention:

Best student paper honourable mention: 

We have curated the top papers presented at CVPR 2021. Here’s the list: 

Meta Pseudo Labels

Meta Pseudo Labels is a semi-supervised learning technique developed by researchers at Google Brain. This model has achieved a new state-of-the-art top-1 accuracy of 90.2 percent on ImageNet, which is 1.6 percent better than the existing SOTA models. The source code is available on GitHub

Animating Pictures With Eulerian Motion Fields

The paper, presented by researchers at the University of Washington, demonstrated a fully automatic method for converting a still image into a realistic animated looping video. The researchers have used an image-to-image translation network to encode motion priors of natural scenes collated from online videos. In this paper, they have demonstrated the effectiveness and robustness of the method by applying it to an extensive collection of examples, including waterfalls, beaches, flowing rivers, etc. 

Taming Transformers for High-Resolution Image Synthesis

Researchers from Heidelberg Collaboratory for Image Processing, IWR, and Heidelberg University, Germany, have combined the effectiveness of the inductive bias of CNNs with the expressivity of transformers to enable and synthesise high-resolution images. The paper shows how to use CNNs to learn a context-rich vocabulary of image constituents. The source code is available on GitHub

Real-Time High-Resolution Background Matting 

In this paper, the researchers from the University of Washington have shown a real-time, high-resolution background replacement technique that operates at 30fps in 4K resolution and 60fps for HD on a modern GPU. 

The researchers have used two neural networks; the base network computes a low-resolution result defined by a second neural network operating at high-resolution on selective patches. Also, they have introduced two large-scale video and image matting datasets, namely VideoMatte240K and PhotoMatte13K/85. The researchers found their approach yielded higher quality results than the previous SOTA in background matting while providing a dramatic boost in speed and resolution.  

https://twitter.com/SenguptRoni/status/1338658762555596801

RepVGG: Making VGG-Style ConvNets Great Again

In this paper, the researchers presented a simple yet powerful architecture of CNN, which has a VGG-like inference-time body composed of a stack of 3×3 convolution and ReLU, while the training-time model has a multi-branch topology.  

The outcome, on ImageNet, RepVGG saw over 80% top-1 accuracy, a first for a plain model. On the NVIDIA 1080 Ti GPU, RepVGG models ran 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy and showed a favourable accuracy-speed trade-off than state-of-the-art models like RegNet and EfficientNet. The trained models and source code are available on GitHub

https://twitter.com/karpathy/status/1381110640186564612

Natural Adversarial Examples

The researchers introduced two challenging datasets (ImageNet-A and ImageNet-O) that reliably cause machine learning model performance to degrade substantially. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. 

The researchers found the existing data augmentation techniques hardly improved performance, and using other public training datasets provided limited improvements. However, upon further analysis, they found modifications to computer vision architectures offered a promising path towards robust models. The code of the two datasets is available on GitHub

VirTex: Learning Visual Representations From Textual Annotations

In this paper, the researchers from the University of Michigan showed high-quality visual representations from fewer images. The researchers revisited supervised pre-training and sought data-efficient alternatives to classification-based pre-training to develop VirTex, a pre-training approach using semantically dense captions to learn visual representations. 

The researchers have trained convolutional networks from scratch on COCO captions and transferred them to downstream recognition tasks, including object detection, image classification, and instance segmentation. As a result, VirTex provided features that match or exceed those learned on ImageNet — supervised or unsupervised — despite using up to ten times fewer images. The code and pretrained models are available on GitHub

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

In this paper, NVIDIA researchers have proposed a neural-head video synthesis model and demonstrated its application in video conferencing. The model learns to synthesise a talking-head video using a source image containing the target person’s appearance and a driving video that dictates the motion in the output. The video versions of the paper figures and additional results are available on GitHub

Learning Continuous Image Representation With Local Implicit Image Function

Researchers from NVIDIA and UC San Diego showcased a continuous representation for images, where they have used Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinates inputs, predicting the RGB value at a given coordinate as an output. 

The researchers have trained an encoder with LIIF representation via self-supervised tasks with super-resolution to generate the continuous representation for images. The source code is available on GitHub

Im2Vec: Synthesizing Vector Graphics Without Vector Supervision

The researchers from University College London and Adobe Research have proposed a new neural network that can generate complex vector graphics with varying topologies and only requires in-direct supervision from readily available raster training images. The researchers have used a differentiable rasterization pipeline that renders the generated vector shapes and composites them together onto a raster canvas. The experiment was conducted on a range of datasets and compared with SOTA SVG-VAE and DeepSVG, both of which require explicit vector graphic supervision. In addition to this, the researchers have demonstrated their approach to the MNIST dataset. The source code is available on GitHub

The Open Access Versions of all the papers reviewed at CVPR 2021 is available here

Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India