Top 10 Popular Datasets For Autonomous Driving Projects

Since a few years, organisations have been investing heavily in the autonomous driving space. The reason behind this spending is expected to reshape the ways of the transport network in a positive way. According to reports, the global autonomous vehicle market is expected to witness an accelerated CAGR of 62.86% to reach $41.24 billion by 2024. 

In this article, we list down ten popular datasets for autonomous driving projects.

The list is in alphabetical order.

1| Astyx Dataset HiRes2019 

The Astyx Dataset HiRes2019 is a popular automotive radar dataset for deep learning-based 3D object detection. The motive behind open-sourcing this dataset is to provide high-resolution radar data to the research community, facilitating and stimulating research on algorithms using radar sensor data. The dataset is a radar-centric automotive dataset based on radar, lidar and camera data for 3D object detection. The size of the dataset is more than 350 MB, and it consists of 546 frames.

Download here.

2| Berkeley DeepDrive

The Berkeley DeepDrive dataset by UC Berkeley is comprised of over 100K video sequences with diverse kinds of annotations including image-level tagging, object bounding boxes, drivable areas, lane markings, and full-frame instance segmentation. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models so that they are less likely to be surprised by new conditions.

Download here.

3| Landmarks

Google open-sourced this dataset for recognising human-made and natural landmarks. The dataset is being released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges in 2018. It contains more than 2 million images depicting 30 thousand unique landmarks from across the world (their geographic distribution is presented below), a number of classes that is ~30x larger than what is available in commonly used datasets.  

Download here.

4| Landmarks-v2

After the release of the landmarks dataset in 2018, the tech giant Google released the Google Landmarks-v2 dataset in 2019. This landmark recognition dataset is larger and much more diverse due to the difference in scale for recognition than the Landmarks dataset. It includes over 5 million images (2x that of the first release) of more than 200 thousand different landmarks (an increase of 7x).

Download here.

5| Level 5

The ride-sharing company, Lyft open-sourced the Level 5 dataset. Level 5 is a comprehensive, large-scale dataset featuring the raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a restricted geographic area. The dataset also includes high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.

Download here.

6| nuScenes Dataset

nuScenes is a large-scale public dataset for autonomous driving. The dataset enables researchers to study urban driving situations using the full sensor suite of a real-self-driving car. The dataset features 1,400,000 camera images, 390,000 lidar sweeps, detailed map information, full sensor suites such as 1x LIDAR, 5x RADAR, 6x camera, IMU, GPS, manual annotations for 23 object classes and other such. 

Download here.

7| Open Images V5

Open Images V5 is a dataset consisting of more than nine million images annotated with labels spanning thousands of object categories. The Open Images V5 dataset features segmentation masks for 2.8 million object instances in 350 groups. The dataset includes 2.68M segmentation masks on the training set, 36.5M image-level labels with over 20k categories as well as 99k masks on the validation and test sets. 

Download here.

8| Oxford Radar RobotCar Dataset

The Oxford RobotCar dataset is comprised of over 100 repetitions of a consistent route through Oxford, the UK which has been captured for more than one year. The dataset is a combination of many different combinations of weather, traffic, and pedestrians, along with longer-term changes such as construction and roadworks.

Download here.

9| Pandaset

Pandaset is one of the popular large scale datasets for autonomous driving. This dataset enables the researchers to study self-driving and aims to promote advanced research and development in autonomous driving and machine learning. The dataset features 60k cameras, 20k Lidar, 28 annotation classes, 37 segmentation labels and much more.

Download here.

10| Waymo Open Dataset

The Waymo Open dataset is an open-sourced high-quality multimodal sensor dataset for autonomous driving. The dataset is extracted from Waymo self-driving vehicles and covers a wide variety of environments, from dense urban centres to suburban landscapes. The collection is comprised of different times, including sunshine, rain, day, night, dawn and dusk. It contains 1000 types of different segments where each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor. 

Download here.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring