Scale AI Launches PandaSet To Promote Urban Driving Situations

Recently, the data platform for AI, Scale AI launched one of the popular large scale datasets for autonomous driving, PandaSet. According to the Scale AI team, this dataset is the first open-source dataset made available for both academic and commercial use.

Amid the pandemic, the collaboration in AI and research communities have witnessed a spike in solving the pressing issues. However, due to the lockdown, some of the industries like autonomous vehicle (AV) are witnessing difficulties in developing new technologies at scale as testing on roads is suspended for the time being to ensure the safety of those involved. 

According to the Scale team, various AV organisations have turned to complementary techniques and simulated data to continue their work, but there is often no substitute for high-quality data that captures the complex and often messy reality of driving in the real world. This particular condition inspired the Scale AI team to release the PandaSet amid the crisis for training machine learning models for autonomous driving.     

A labelled data serves as an important element while working on machine learning or deep learning models. It can be said that a good and clean dataset is more vital than machine learning algorithms while building robust AI models. Scale AI has been accelerating the development of AI applications by assisting machine learning teams in generating high-quality data.


PandaSet is a large-scale dataset that can be used for training machine learning models for autonomous driving. The dataset is provided by the Scale AI team in collaboration with the LIDAR (3D-sensors) manufacturing company, Hesai

PandaSet is a combination of sophisticated LIDAR technology with high-quality data annotation that aims to promote and advance research and development in autonomous driving and machine learning.

According to the Scale AI team, this dataset features data collected using a forward-facing LIDAR with image-like resolution called PandarGT as well as a mechanical spinning LIDAR called Pandar64. The collected data was annotated with a combination of cuboid and segmentation annotation that is called Scale 3D Sensor Fusion Segmentation.

Behind PandaSet

In this dataset, there are more than 48,000 camera images and over 16,000 LIDAR sweeps — more than 100 scenes of 8s each. By combining the strengths of both mechanical spinning and forward-facing LIDARs, PandaSet captures the complex variables of urban driving in rich detail. It also includes 28 different annotation classes for each scene as well as 37 semantic segmentation labels for the majority of scenes. 

PandaSet covers some of the most challenging driving conditions for level 5 autonomy, including complex urban environments, their dense traffic and pedestrians, steep hills, construction, and a variety of lighting conditions in the day, dusk and evening.

This dataset features Scale’s Point Cloud Segmentation that enables the highest precision and quality annotation of complex objects, such as smoke or rain. It also features Scale’s market-leading Sensor Fusion technology, allowing ML teams to blend multiple LIDAR, RADAR and camera inputs into a single point cloud that allows for the semantic segmentation of different objects in LIDAR data.

Benefits Of PandaSet

  • According to the team, the features like high-quality data annotations, content as well as a no-cost commercial license are the reasons which make PandaSet a valuable resource to the AV organisations
  • This is an open-source dataset and can be used for both commercial and academic purposes
  • By allowing ML teams to exploit their LIDAR data much more systematically, this makes PandaSet ideal for building highly-performant autonomous systems
  • The dataset enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car
  • This high-quality dataset will help in building safe and effective AV systems


The pandaset-devkit can be installed in a Python environment in the following ways: –

  • Create a Python>=3.6 environment with the package manager — pip 
  • Clone the repository git clone
  • cd into pandaset-devkit/python
  • Execute pip install 

Wrapping Up

Along with Pandaset, the Scale AI team has also provided three more open-sourced large-scale Level 5 datasets for cutting-edge vehicle research that include nuScenes, CADC and Lyft. The company’s advanced LIDAR, image, video and NLP annotation APIs allow machine learning teams at popular organisations like OpenAI, Lyft, Pinterest, and Airbnb to focus on building differentiated models vs labelling data.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.

Intel Goes All in on AI

Pat Gelsinger said, there are three types of chip manufacturers, “you’re big, you’re niche or you’re dead”