India’s First, Open Source Traffic Dataset Is Paving The Road For Autonomous Vehicles

Photo by Robin Baumgarten/Flickr

With the number of deaths caused by road-related issues and accidents reaching 400 per day in India, it has become imperative to find solutions to minimise road fatalities. Developed countries have already embarked on a non-human technological journey to alleviate fatal human flaws, however developing economies like India, still have to catch up.

For example, with the CityScape dataset, the researchers were able to detect the crucial challenges in Germany. This dataset can also work for other developed countries, but for India, where traffic violations are rampant, these datasets can’t be inculcated to ensure safer road travel.

Collecting raw unstructured data in India is very different from any other populous country. It is not just the lane direction and crossroads, but the erratic and inconsistent driving routines of a typical Indian rider that are also to be taken into consideration. This infamous driving scene, which is unique to India, became the subject of a thorough survey for this larger unique dataset. Now, a project funded by Intel in partnership with the Government of Telangana and Karnataka, along with the team from IIIT Hyderabad are also addressing this issue.


Sign up for your weekly dose of what's up in emerging technology.

Conducting Study In Congestions

The project, funded by Intel, began in November 2017 in partnership with the government of Telangana and Karnataka. The team at IIIT Hyderabad drove around Hyderabad and Bengaluru, areas known for traffic congestion. The idea here was to collect and label unique variables such as signposts, pedestrians, types of vehicles, streetlights etc.

The objective was to ensure road safety by creating a dataset which suits our Indian needs. So, the team created about 10,000 pixel-level annotated images and 50,000 object level annotated images, twice the size of Germany’s Cityscape, which contained 5,000 frames.

For fine annotation, images from forward-facing cameras of a stereo pair were taken. These images were then sampled from the video feed with more attention laid onto traffic junctions and other crowded portions in the feed.

A total of 34 labels were used for annotation. The labels are well defined with text and example images.  

To address the synonymous labelling ambiguity for scene diversity, a 4 level label hierarchy was used. Where the ambiguity is deliberately is increased with the level.

The setup used by Professor Jawahar and the team at IIIT Hyderabad

At the pixel-level, each pixel in the image is associated with an object class such as an auto-rickshaw, a car, a cycle, and so on.


Via Insaan IIIT-H

“This is the holy grail of data sets and will test the best in class algorithms,” says Dheemanth Nagaraj, an Intel fellow and architect, server CPU development and new products innovations at chipmaker Intel.

Input example images baseline trained on the dataset via Intel

Challenges Of Unstructured Data

Seeing is believing and this is no different in case of autonomous vehicles either. Only here, there will be an action taken almost instantly. So, whatever images the camera captures will be fed into the onboard electronics which runs on well-trained algorithms which dissect the images for colour and curves. Image processing is key functionality of any driverless vehicle. Apart from the driving habits, there are other issues which make the image processing a tricky job. For instance, there are billboards with images high in colour intensity. Imagine having a billboard displaying a brand new Audi; the algorithm learns from curves, edges and colour intensities and, a curve is just a curve in 2D. Add to this, the lighting and weather conditions. These variables interplay and generate confounding scenarios for the algorithm. A high-quality annotation at scale is required to address these complex issues. And, IDD manages to do that quite well.

Other complexities resulting in unstructured data include ambiguous road boundaries, diversity of vehicles and pedestrians, extensive use of information board, the diversity of ambient conditions and high density of motorbikes

Key Takeaways Of The Study

  • Identifies drawbacks prevalent in existing datasets and the need for additional labels with a hierarchy to reduce confusion
  • Examines the domain discrepancy properties with respect to other semantic segmentation datasets
  • The unconstrained nature of the dataset provides a novel setting for more situational awareness and optimum path planning
  • Forms a platform and sets a benchmark to solve advanced computer vision problems


Though driverless AI is advancing rapidly with the support of American giants like Google and Tesla; the technology is not yet mainstreamed. In the Indian scenario, however, technical problems persist. This exclusive Indian data provide much-needed impetus to the autonomous industry not only in India but across the world. For a densely populated country like India, leading automotive companies are looking to capture a major chunk of the market cap and would certainly use up resources such as this IDD dataset.

“Autonomy will come step by step. We’ll see semi-automatic systems, driving assistants, interactive systems and safety features that are AI enabled before that,” observed Jawahar, who led the team behind this year-long prestigious project.

But, in India, things don’t look so smooth. Earlier this year, Nitin Gadkari, Minister of Road Transport and Highways, had blatantly opposed the arrival of driverless cars. On the contrary, the Traffic Amendment Bill encourages exploring new technologies to improve road transport in India. India is on track to become the world’s third largest car manufacturer. With such high numbers, policymakers need to be flexible about the stance they take and imbibe solutions which might seem technologically advanced but will soon become a norm in the near future.

Experts predict an autonomous intervention in this sector by 2025. For India to keep up with the pace, it needs to prepare the roads, fill the potholes, intensify the research and be ready to deploy when the advancements have reached maturity.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM