After Lyft, Waymo Open Sources Self-Driving Dataset To The Public

Alphabet’s autonomous driving subsidiary Waymo is one of the most promising players in the self-driving market. From joining hands with other car manufacturers to offering general public rides in early rider program, Waymo is pushing hard to make an autonomous reality in public.

Yesterday, Waymo open-sourced high-quality multimodal sensor dataset for autonomous driving. The dataset is extracted from Waymo self-driving vehicles and covers a wide variety of environments, from dense urban centres to suburban landscapes. The collection is comprised of different times including sunshine, rain, day, night, dawn and dusk.

According to the researchers, this dataset is believed to be the largest, richest and most diverse self-driving dataset ever released for the research communities. The main purpose behind open-sourcing this dataset is to make advancements in the field of autonomous tech. Some of the important features of this dataset are mentioned below

Diverse Driving Environments: This dataset covers a large area of a dense environment which includes San Francisco, Phoenix and many other places at different times of a day including sunny and rainy days

Size And Coverage: The dataset contains 1000 types of different segments where each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor

High-Resolution (360° View): Each segment in the dataset contains sensor data from five high-resolution Waymo LiDARs and five front-and-side-facing cameras

Dense Labelling: The dataset includes LiDAR frames and images with various objects such as vehicles, pedestrians, cyclists, and signage carefully labelled, capturing a total of 12 million 3D labels and 1.2 million 2D labels

Camera-LiDAR Synchronisation: The researchers at Waymo use 3D perception models that fuse data from various cameras and LiDAR such that the hardware and software work in a seamless manner.

Advantages of This Dataset

This dataset has several advantages which will help the autonomous research community to work and enhance the existing self-driving research besides impacting other domains  like computer vision and robotics. Some of them are mentioned below:  

  1. With this dataset, researchers will get an opportunity to develop intelligent models which can be used to track and predict the behaviour of other road users
  2. The data has the potential to assist the research community to make advances in 2D and 3D perception
  3. Utilising this dataset, the autonomous manufacturers can make progress in areas such as domain adaptation, scene understanding and behaviour prediction

Other Autonomous Dataset

Agro AI

Agro AI in  in collaboration with faculty and students from CMU and Georgia Institute of Technology open sourced a curated data in Agroverse this June. The dataset is designed to support autonomous vehicle perception tasks including 3D tracking and motion forecasting.

The dataset includes 327,793 interesting vehicle trajectories extracted from over 1000 driving hours and rich semantic maps, 3D tracking annotations for 113 scenes, one API to connect the map data with sensor information along with two high-definition (HD) maps with lane centrelines, traffic direction, ground height, and more. The sensor data consists of 360-degree images from 7 cameras with overlapping fields of view, forward-facing stereo imagery, 3D point clouds from long-range LiDAR, and 6-DOF pose.


Last month, Lyft open-sourced an autonomous driving dataset known as the Level 5 Dataset. The researchers at Lyft claimed the dataset to be the largest public data set of its kind. The dataset includes 55,000 human-labelled 3D annotated frames, a drivable surface map and an underlying HD spatial semantic map (including lanes, crosswalks, etc.) for data contextualisation. The data is collected with the help of seven cameras and three LiDAR sensors.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox