Last updated January 25, 2021
In AI Mysteries

Top Computer Vision Datasets Open-Sourced At CVPR 2020

Share

Published on June 19, 2020

by Ambika Choudhury

A good dataset serves as the backbone of an Artificial Intelligence system. Data assists in various ways as it helps understand how the system is performing, understand meaning insights and others. At the premier annual Computer Vision and Pattern Recognition conference (CVPR 2020), several datasets have been open-sourced in order to help the community achieve higher accuracies and insights.

Below here we have listed the top 10 Computer Vision datasets that are open-sourced at the CVPR 2020 conference.

(The list is in no particular order)

1| FaceScape: A Large-Scale High-Quality 3D Face Dataset And Detailed Riggable 3D Face Prediction

About: FaceScape is a large-scale detailed 3D face dataset that includes 18,760 textured 3D face models, which are captured from 938 subjects and each with 20 specific expressions. Using the FaceScape dataset, the researchers studied how to predict a detailed face model from a single image. The dataset is released free for non-commercial research.

Know more here.

2| OASIS: A Large-Scale Dataset For Single Image 3D In The Wild

About: Open Annotations of Single Image Surfaces (OASIS) is a large-scale dataset for single-image 3D in the wild. The dataset consists of human annotations that enable pixel-wise reconstruction of 3D surfaces for 140,000 randomly sampled Internet images. A key feature of OASIS is its rich annotations of human 3D perception.

According to the researchers, OASIS opens up new research opportunities on a wide range of single-image 3D tasks, such as depth estimation, surface normal estimation, boundary detection, and instance segmentation of planes by providing in-the-wild ground truths either for the first time or at a much larger scale than prior work.

Know more here.

3| Scalability In Perception For Autonomous Driving: Waymo Open Dataset

About: Researchers from Google and Waymo introduced a new large-scale, high quality, diverse dataset. The proposed dataset contains a large number of high-quality, manually annotated 3D ground truth bounding boxes for the LiDAR data, and 2D tightly fitting bounding boxes for camera images.

It consists of 1150 scenes, each spaning 20 seconds, which are well synchronised and calibrated high-quality LiDAR and camera data captured across a range of urban and suburban geographies. The dataset contains around 12 million LiDAR box annotations and around 12 million camera box annotations, giving rise to around 113k LiDAR object tracks and around 250k camera image tracks.

Know more here.

4| Google Landmarks Dataset v2 – A Large-Scale Benchmark For Instance-Level Recognition And Retrieval

About: Researchers at Google Research presented the Google Landmarks Dataset v2, a new large-scale benchmark for image retrieval and instance recognition. The GLDv2 dataset consists of more than 5M images of over 200k human-made as well as natural landmarks that were provided to the Wikimedia Commons by some local experts. According to them, this dataset is the largest dataset of its kind till date that offers various real-world challenges that were absent in previous datasets including extreme class imbalance and out-of-domain test images.

Know more here.

5| FineGym: A Hierarchical Video Dataset For Fine-grained Action Understanding

About: In order to take action recognition to a new level, researchers at the Chinese University of Hong Kong developed Fine-grained Gymnastic or FineGym, which is a large-scale high-quality action dataset that provides fine-grained annotations.

The dataset provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy. The features of this dataset include multi-level semantic hierarchy, temporal structure and high-quality.

Know more here.

6| DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

About: DeeperForensics-1.0 is a large-scale dataset for face forgery detection. This dataset is the first version of this benchmark and represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames for real-world face

forgery detection. It is 10 times larger than the existing datasets of the same kind.

Know more here.

7| HUMBI: A Large Multiview Dataset Of Human Body Expressions

About: HUman Multiview Behavioral Imaging or HUMBI is a new large multiview dataset for human body expressions with natural clothing. The motive of this dataset is to help in modelling view-specific appearance and geometry of gaze, face, hand, body, and garment from assorted people.

The HUMBI dataset is effective in learning and reconstructing a complete human model and is said to be complementary to the existing datasets of human body expressions with limited views and subjects including MPII-Gaze, Multi-PIE, Human3.6M, and Panoptic Studio datasets.

Know more here.

8| COCAS: A Large-Scale Clothes Changing Person Dataset For Re-Identification

About: ClOthes ChAnging Person Set (COCAS) is a large-scale re-id benchmark that is introduced to address the clothes changing person re-id problem. The dataset provides multiple

images of the same identity with different clothes and contain 62,382 body images from 5,266 persons and each person has 5∼25 images with 2∼3 clothes.

Know more here.

9| VIOLIN: A Large-Scale Dataset For Video-and-Language Inference

About: VIdeO-and-Language INference or VIOLIN is a new large-scale dataset which consists of 95,322 video hypothesis pairs from 15,887 video clips, spanning over 582 hours of video. These video clips contain rich content with diverse temporal dynamics, event shifts, and people interactions, collected from two sources, which are popular TV shows, and movie clips from YouTube channels.

Know more here.

10| nuScenes: A Multimodal Dataset For Autonomous Driving

About: nuTonomy scenes (nuScenes) is the first dataset to carry the fully autonomous vehicle sensor suite, that are 6 cameras, 5 radars and 1 lidar, all with full 360-degree field of view. It comprises of 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. nuScenes has the largest collection of 3D box annotations of any previously released dataset.

Know more here.

Access all our open Survey & Awards Nomination forms in one place