Computer Vision and Pattern Recognition (CVPR) conference is one of the most popular events around the globe where computer vision experts and researchers gather to share their work and views on the trending techniques on various computer vision topics, including object detection, video understanding, visual recognition, among others.
In this article, we have listed down all the important topics and tutorials that have been discussed on the 1st and 2nd day of the conference.
1| RANSAC in 2020
In this tutorial, the researchers presented the latest developments in robust model fitting, recent advancements in new sampling and local optimisation methods, novel branch-and-bound and mathematical programming algorithms in the global methods as well as the latest developments in differentiable alternative to Random Sample Consensus Algorithm or RANSAC.
To know what a RANSAC is and how it works, click here.
2| Visual Recognition for Images, Video, and 3D
This tutorial was presented by the researchers of Facebook AI Research (FAIR). Here they discussed the popular approaches and recent advancements in the family of visual recognition tasks for different input modalities. They discussed the connections between the techniques specialised for different input modalities, providing some insights about diverse challenges that each modality presents. They also talked about Detectron2, PyTorch3D and PySlowFast.
3| Neural Rendering
This was a full-day tutorial where a team of researchers discussed the fundamentals of neural rendering and summarized recent trends and applications. Starting with an overview of the underlying graphics, vision and machine learning concepts, the researchers discussed the critical aspects of neural rendering approaches.
They also discussed various important use cases for the described algorithms such as novel view synthesis, facial and body reenactment, free-viewpoint video, the creation of photorealistic avatars for virtual and augmented reality telepresence and other such.
Click here to know about Neural Rendering.
4| Neuro-Symbolic Visual Reasoning and Program Synthesis
In this tutorial, a team of researchers from Google Brain, Stanford University and others discussed the relatively new field of neuro-symbolic computation proposes to combine the strengths of deep models with symbolic approaches. They used the former to learn disentangled, interpretable, and low-dimensional representations which significantly reduce the search space for symbolic approaches such as program synthesis. The topics of this tutorial include neural symbolic concept learning, Neuro-Symbolic Commonsense Intelligence, from Neural to Neurosymbolic 3D Modeling, among others.
Click here to watch the tutorial.
5| Cycle Consistency and Synchronization in Computer Vision
In this tutorial, the researchers introduced the fundamentals of cycle-consistency and reviewed the broad range of studies that make use of it. They covered different techniques for solving multi-view synchronisation problems in computer vision, discussed how to achieve cycle consistency in CV problems. Lastly, they talked about the recent techniques that jointly optimise neural networks across multiple domains and other such related topics.
6| From NAS to HPO: Automated Deep Learning
Researchers from MIT, Amazon and UC Davis discussed how to design the hyper-parameter ranges and possible network architecture combinations, and pass the workload to the machines and covered the important concepts in automatic machine learning, and its applications in computer vision.
Here is the tutorial of Automated HP and Architecture Tuning-
7| Local Features: From SIFT to Differentiable Methods
In this tutorial, you will learn the baseline stereo pipeline, methods to match with OpenCV SIFT, use of algorithms in Phototourism dataset, geometric verification of RANSAC algorithm and other such.
Watch the tutorial below
8| Interpretable Machine Learning for Computer Vision
In this tutorial, a team of researchers from OpenAI, Oxford and others discussed the recent progress that has been made on visualisation, interpretation, and explanation methodologies for analysing both the data and the models in computer vision. This tutorial is a continuation of interpretable ML, and the main theme of the tutorial is to build up consensus on the emerging topic of machine learning interpretability, by clarifying the motivation, the typical methodologies, the prospective trends, and the potential industrial applications of the resulting interpretability.
9| Zeroth Order Optimisation: Theory and Applications to Deep Learning
In this tutorial, researchers from IBM Research discussed a comprehensive introduction to recent advances in Zeroth Order optimisation methods in both theory and applications. On the theory side, they discussed the convergence rate and iteration complexity analysis of ZO algorithms and made comparisons to their first-order counterparts.
On the application side, they highlighted the appealing application of ZO optimisation to study the robustness of deep neural networks and computer vision tasks, for e.g. image classification, object detection, and image captioning, practical and efficient adversarial attacks that generate adversarial examples from a black-box ML model, and design of robust ML systems by leveraging ZO optimisation.
10| Disentangled 3D Representations for Relightable Performance Capture of Humans
This was a full-day tutorial where researchers from Google discussed how to combine geometric pipelines with recent advances in neural rendering to construct disentangled 3D representations for photo-realistic renderings of humans in novel viewpoints and desired lighting conditions.
They talked about the current state-of-the-art for 3D performance capture, highlighting the pros and cons of various techniques. The topics of their discussions include learned disentangled representations for perception tasks, high-quality depth sensors for volumetric capture, reflectance estimation in images, videos and 3D content, among others.
11| Recent Advances in Vision-and-Language Research
A team of researchers from Microsoft, Facebook and JD.com talked about some of the recently popular tasks in the domain of Vision-and-Language such as visual captioning, visual grounding, visual question answering and reasoning, text-to-image generation, and self-supervised learning for universal image-text representations. They covered state-of-the-art approaches in each of these areas and discussed key principles that epitomise the core challenges and opportunities in multimodal understanding, reasoning, and generation.
12| Learning and Understanding Single Image Depth Estimation in the Wild
In this tutorial, the researchers discussed various topics for learning and understanding single image depth estimation in the wild. The topics include stereo supervision, monocular supervision, auxiliary supervision, how neural networks learn to estimate depth from a single image and how reliably estimated depth maps are, what mobile-depth estimation are and other such.
13| 3D Face Modeling and Reconstruction
This tutorial focused on the problem of reconstructing a 3D model of a human face from a single image, possibly captured in unconstrained conditions, i.e., in the wild. The researchers discussed the basic concepts and definitions of the 3D Morphable Model, the problem of 3D dense registration of point clouds, optimisation techniques used to estimate the 3DMM parameters (fitting) from a single image and much more.
14| Efficient Data Annotation for Self-Driving Cars via Crowdsourcing on a Large-Scale
In this tutorial, the researchers at Yandex presented a portion of unique industry experience in the efficient data annotation (labelling) for self-driving cars shared by both leading researchers and engineers from Yandex. They discussed the data processing pipeline required for the cars to learn how to behave autonomously on the roads, introduction to data annotation via public crowdsourcing marketplaces and a presentation of key components of efficient annotation, how data annotation constitutes a crucial part that makes the learning process effective and other such related topics.
15| Learning Representations via Graph-Structured Networks
In this tutorial, researchers from various tech giants introduced a series of effective graph-structured networks, including non-local neural networks, spatial generalised propagation networks, relation networks for objects and multi-agent behaviour modelling, graph networks for videos and data of the 3D domain. They discussed how to utilise graph-structured neural architectures to study the network connectivity patterns, including the open challenges that still exist in many vision problems.
16| A Comprehensive Tutorial on Video Modeling
This tutorial on video modelling is organised by Amazon AWS, where the researchers discussed the problem of human activity understanding in videos, including its input data, common tasks, popular models, and the open challenges. They introduced GluonCV video model zoo, which has coverage for popular video models and datasets with extensive tutorials as well as an efficient video reader — Decord.
17| Novel View Synthesis: From Depth-Based Warping to Multi-Plane Images and Beyond
Novel view synthesis is a long-standing problem at the intersection of computer graphics and computer vision. In this tutorial, the researchers introduced the problem while offering context and taxonomy of the different methods, including its most recent approaches in the field.
18| All You Need to Know About Self-Driving
This was a full-day tutorial by Uber ATG covering all aspects of self-driving. The tutorial provided the necessary background for understanding the different tasks and associated challenges, the different sensors and data sources one can use and how to exploit them, as well as how to formulate the relevant algorithmic problems such that efficient learning and inference is possible.