Recently, the data platform for AI, Scale AI launched one of the popular large scale datasets for autonomous driving, PandaSet. According to the Scale AI team, this dataset is the first open-source dataset made available for both academic and commercial use.
Amid the pandemic, the collaboration in AI and research communities have witnessed a spike in solving the pressing issues. However, due to the lockdown, some of the industries like autonomous vehicle (AV) are witnessing difficulties in developing new technologies at scale as testing on roads is suspended for the time being to ensure the safety of those involved.
According to the Scale team, various AV organisations have turned to complementary techniques and simulated data to continue their work, but there is often no substitute for high-quality data that captures the complex and often messy reality of driving in the real world. This particular condition inspired the Scale AI team to release the PandaSet amid the crisis for training machine learning models for autonomous driving.
A labelled data serves as an important element while working on machine learning or deep learning models. It can be said that a good and clean dataset is more vital than machine learning algorithms while building robust AI models. Scale AI has been accelerating the development of AI applications by assisting machine learning teams in generating high-quality data.
PandaSet
PandaSet is a large-scale dataset that can be used for training machine learning models for autonomous driving. The dataset is provided by the Scale AI team in collaboration with the LIDAR (3D-sensors) manufacturing company, Hesai.
PandaSet is a combination of sophisticated LIDAR technology with high-quality data annotation that aims to promote and advance research and development in autonomous driving and machine learning.
According to the Scale AI team, this dataset features data collected using a forward-facing LIDAR with image-like resolution called PandarGT as well as a mechanical spinning LIDAR called Pandar64. The collected data was annotated with a combination of cuboid and segmentation annotation that is called Scale 3D Sensor Fusion Segmentation.
Behind PandaSet
In this dataset, there are more than 48,000 camera images and over 16,000 LIDAR sweeps — more than 100 scenes of 8s each. By combining the strengths of both mechanical spinning and forward-facing LIDARs, PandaSet captures the complex variables of urban driving in rich detail. It also includes 28 different annotation classes for each scene as well as 37 semantic segmentation labels for the majority of scenes.
PandaSet covers some of the most challenging driving conditions for level 5 autonomy, including complex urban environments, their dense traffic and pedestrians, steep hills, construction, and a variety of lighting conditions in the day, dusk and evening.
This dataset features Scale’s Point Cloud Segmentation that enables the highest precision and quality annotation of complex objects, such as smoke or rain. It also features Scale’s market-leading Sensor Fusion technology, allowing ML teams to blend multiple LIDAR, RADAR and camera inputs into a single point cloud that allows for the semantic segmentation of different objects in LIDAR data.
Benefits Of PandaSet
- According to the team, the features like high-quality data annotations, content as well as a no-cost commercial license are the reasons which make PandaSet a valuable resource to the AV organisations
- This is an open-source dataset and can be used for both commercial and academic purposes
- By allowing ML teams to exploit their LIDAR data much more systematically, this makes PandaSet ideal for building highly-performant autonomous systems
- The dataset enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car
- This high-quality dataset will help in building safe and effective AV systems
Installation
The pandaset-devkit can be installed in a Python environment in the following ways: –
- Create a Python>=3.6 environment with the package manager — pip
- Clone the repository git clone git@github.com:scaleapi/pandaset-devkit.git
- cd into pandaset-devkit/python
- Execute pip install
Wrapping Up
Along with Pandaset, the Scale AI team has also provided three more open-sourced large-scale Level 5 datasets for cutting-edge vehicle research that include nuScenes, CADC and Lyft. The company’s advanced LIDAR, image, video and NLP annotation APIs allow machine learning teams at popular organisations like OpenAI, Lyft, Pinterest, and Airbnb to focus on building differentiated models vs labelling data.