Top 10 Popular Datasets For Autonomous Driving Projects

Since a few years, organisations have been investing heavily in the autonomous driving space. The reason behind this spending is expected to reshape the ways of the transport network in a positive way. According to reports, the global autonomous vehicle market is expected to witness an accelerated CAGR of 62.86% to reach $41.24 billion by 2024. 

In this article, we list down ten popular datasets for autonomous driving projects.

The list is in alphabetical order.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

1| Astyx Dataset HiRes2019 

The Astyx Dataset HiRes2019 is a popular automotive radar dataset for deep learning-based 3D object detection. The motive behind open-sourcing this dataset is to provide high-resolution radar data to the research community, facilitating and stimulating research on algorithms using radar sensor data. The dataset is a radar-centric automotive dataset based on radar, lidar and camera data for 3D object detection. The size of the dataset is more than 350 MB, and it consists of 546 frames.

Download here.


Download our Mobile App



2| Berkeley DeepDrive

The Berkeley DeepDrive dataset by UC Berkeley is comprised of over 100K video sequences with diverse kinds of annotations including image-level tagging, object bounding boxes, drivable areas, lane markings, and full-frame instance segmentation. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models so that they are less likely to be surprised by new conditions.

Download here.

3| Landmarks

Google open-sourced this dataset for recognising human-made and natural landmarks. The dataset is being released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges in 2018. It contains more than 2 million images depicting 30 thousand unique landmarks from across the world (their geographic distribution is presented below), a number of classes that is ~30x larger than what is available in commonly used datasets.  

Download here.

4| Landmarks-v2

After the release of the landmarks dataset in 2018, the tech giant Google released the Google Landmarks-v2 dataset in 2019. This landmark recognition dataset is larger and much more diverse due to the difference in scale for recognition than the Landmarks dataset. It includes over 5 million images (2x that of the first release) of more than 200 thousand different landmarks (an increase of 7x).

Download here.

5| Level 5

The ride-sharing company, Lyft open-sourced the Level 5 dataset. Level 5 is a comprehensive, large-scale dataset featuring the raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a restricted geographic area. The dataset also includes high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.

Download here.

6| nuScenes Dataset

nuScenes is a large-scale public dataset for autonomous driving. The dataset enables researchers to study urban driving situations using the full sensor suite of a real-self-driving car. The dataset features 1,400,000 camera images, 390,000 lidar sweeps, detailed map information, full sensor suites such as 1x LIDAR, 5x RADAR, 6x camera, IMU, GPS, manual annotations for 23 object classes and other such. 

Download here.

7| Open Images V5

Open Images V5 is a dataset consisting of more than nine million images annotated with labels spanning thousands of object categories. The Open Images V5 dataset features segmentation masks for 2.8 million object instances in 350 groups. The dataset includes 2.68M segmentation masks on the training set, 36.5M image-level labels with over 20k categories as well as 99k masks on the validation and test sets. 

Download here.

8| Oxford Radar RobotCar Dataset

The Oxford RobotCar dataset is comprised of over 100 repetitions of a consistent route through Oxford, the UK which has been captured for more than one year. The dataset is a combination of many different combinations of weather, traffic, and pedestrians, along with longer-term changes such as construction and roadworks.

Download here.

9| Pandaset

Pandaset is one of the popular large scale datasets for autonomous driving. This dataset enables the researchers to study self-driving and aims to promote advanced research and development in autonomous driving and machine learning. The dataset features 60k cameras, 20k Lidar, 28 annotation classes, 37 segmentation labels and much more.

Download here.

10| Waymo Open Dataset

The Waymo Open dataset is an open-sourced high-quality multimodal sensor dataset for autonomous driving. The dataset is extracted from Waymo self-driving vehicles and covers a wide variety of environments, from dense urban centres to suburban landscapes. The collection is comprised of different times, including sunshine, rain, day, night, dawn and dusk. It contains 1000 types of different segments where each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor. 

Download here.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.