MITB Banner

Scale AI Launches PandaSet To Promote Urban Driving Situations

Share

Recently, the data platform for AI, Scale AI launched one of the popular large scale datasets for autonomous driving, PandaSet. According to the Scale AI team, this dataset is the first open-source dataset made available for both academic and commercial use.

Amid the pandemic, the collaboration in AI and research communities have witnessed a spike in solving the pressing issues. However, due to the lockdown, some of the industries like autonomous vehicle (AV) are witnessing difficulties in developing new technologies at scale as testing on roads is suspended for the time being to ensure the safety of those involved. 

According to the Scale team, various AV organisations have turned to complementary techniques and simulated data to continue their work, but there is often no substitute for high-quality data that captures the complex and often messy reality of driving in the real world. This particular condition inspired the Scale AI team to release the PandaSet amid the crisis for training machine learning models for autonomous driving.     

A labelled data serves as an important element while working on machine learning or deep learning models. It can be said that a good and clean dataset is more vital than machine learning algorithms while building robust AI models. Scale AI has been accelerating the development of AI applications by assisting machine learning teams in generating high-quality data.

PandaSet

PandaSet is a large-scale dataset that can be used for training machine learning models for autonomous driving. The dataset is provided by the Scale AI team in collaboration with the LIDAR (3D-sensors) manufacturing company, Hesai

PandaSet is a combination of sophisticated LIDAR technology with high-quality data annotation that aims to promote and advance research and development in autonomous driving and machine learning.

According to the Scale AI team, this dataset features data collected using a forward-facing LIDAR with image-like resolution called PandarGT as well as a mechanical spinning LIDAR called Pandar64. The collected data was annotated with a combination of cuboid and segmentation annotation that is called Scale 3D Sensor Fusion Segmentation.

Behind PandaSet

In this dataset, there are more than 48,000 camera images and over 16,000 LIDAR sweeps — more than 100 scenes of 8s each. By combining the strengths of both mechanical spinning and forward-facing LIDARs, PandaSet captures the complex variables of urban driving in rich detail. It also includes 28 different annotation classes for each scene as well as 37 semantic segmentation labels for the majority of scenes. 

PandaSet covers some of the most challenging driving conditions for level 5 autonomy, including complex urban environments, their dense traffic and pedestrians, steep hills, construction, and a variety of lighting conditions in the day, dusk and evening.

This dataset features Scale’s Point Cloud Segmentation that enables the highest precision and quality annotation of complex objects, such as smoke or rain. It also features Scale’s market-leading Sensor Fusion technology, allowing ML teams to blend multiple LIDAR, RADAR and camera inputs into a single point cloud that allows for the semantic segmentation of different objects in LIDAR data.

Benefits Of PandaSet

  • According to the team, the features like high-quality data annotations, content as well as a no-cost commercial license are the reasons which make PandaSet a valuable resource to the AV organisations
  • This is an open-source dataset and can be used for both commercial and academic purposes
  • By allowing ML teams to exploit their LIDAR data much more systematically, this makes PandaSet ideal for building highly-performant autonomous systems
  • The dataset enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car
  • This high-quality dataset will help in building safe and effective AV systems

Installation

The pandaset-devkit can be installed in a Python environment in the following ways: –

  • Create a Python>=3.6 environment with the package manager — pip 
  • Clone the repository git clone git@github.com:scaleapi/pandaset-devkit.git
  • cd into pandaset-devkit/python
  • Execute pip install 

Wrapping Up

Along with Pandaset, the Scale AI team has also provided three more open-sourced large-scale Level 5 datasets for cutting-edge vehicle research that include nuScenes, CADC and Lyft. The company’s advanced LIDAR, image, video and NLP annotation APIs allow machine learning teams at popular organisations like OpenAI, Lyft, Pinterest, and Airbnb to focus on building differentiated models vs labelling data.

PS: The story was written using a keyboard.
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed