Active Hackathon

India’s Open Government Data Platform Is Helping Data Scientists Kick-start Their ML Journey

The NDA government has come into its new term with a renewed gusto towards analytics in the public sector. Recognising the disruptive effect that the upcoming AI wave will have on citizen’s day-to-day activities, the government has put it on a spotlight.

One of the biggest needs for a healthy analytics ecosystem in any given environment is data. Identifying the data-hungry nature of the new data science and analytics startups in India, the government initiated the Open Government Data Platform at


Sign up for your weekly dose of what's up in emerging technology.

This move allows data scientists and machine learning engineers alike to harness one of the biggest collections of datasets available to the public.

Origin Of OGD Platform

The OGD platform is very old, but is recently coming to prominence owing to the rise of data science in India. First conceived of as a joint effort between India and the United States in 2010, the platform was launched in 2012, along with the adoption of the National Data Sharing and Accessibility Policy.

The combined adoption of both of these standards enabled India to build up the portfolio of the open government datasets. This was said to allow individuals to access data about the inner workings of the government, allowing for transparency and ease of access for citizens.

The website also aimed to reduce the amount of time and effort put in by companies to clean and harvest datasets available in the public sector. Moreover, it also reduces RTI time for the citizen, while increasing accessibility. The initiative was not heavily used in its first few years, both due to governments not updating information on the portal and updated information not being relevant to the citizens.

This has changed over the last two years, as the number of datasets uploaded on the site has increased from around 73,000 in 2017 to over 294,000 today.

The Scope Of NDSAP And OGD

In addition to the datasets, there are over 4,500 catalogs, 11,000 APIs and 1,600 visualizations. These datasets have been viewed over 22.4 million times, and have been downloaded over 7 million times.

The site has resources from 148 departments, uploaded by 221 chief data officers. This makes it one of the most comprehensive undertakings of the Indian government.

This is, mostly, owing to the enforcement of the National Data Sharing and Accessibility PLatform. According to the site,

“NDSAP aims to provide an enabling provision and platform for proactive and open access to the data generated by various Government of India entities.”

The initiative focuses on making the data available in machine-readable format, allowing data scientists and analysts to make the most of it. Under the NDSAP, the government has sanctioned the collection of data that is created, generated, collected and archived by the use of public funds.

This means that almost all of the data collected by the data will be subject to sharing under the NSDAP. The sentiment here is also that citizens can see exactly how their money is being spent, allowing for transparency in a market filled with illicit transactions.

To prevent sensitive information from falling into the hands of the public, the datasets are segmented into negative lists and open lists. Negative lists contain personal information such as names, addresses and more.

In keeping with the times, the OGD has transitioned from using PDFs to represent datasets to using more modern formats such as CSV, XLS, ODS, XML, RDF, KML, and RSS/ATOM feeds for fast changing data.

The NDSAP had also created a position titled Chief Data Officer, who is an individual appointed to manage the data of a particular department or organization.

The Effects Of The OGD

The site is comprised of four modules made to capitalize on the effects that the dataset will have on the population. They are the data management system for managing the datasets, content management system for managing the content types of the platform, a feedback module known as the visitor relationship management, and a set of community forums.

The dataset now has data ranging across verticals, including healthcare, education, demographics, economy, industries, transport and labour. Not only datasets, visualizations, blogs and forums populate the site. These are heavily frequented by data scientists discussing the datasets on the site.

The site is a godsend for data scientists today and offers datasets for the exploration of the Indian government through data science and analytics. The site also offers a method for companies operating in the public sector to practice on datasets collected from a large number of individuals.

More Great AIM Stories

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM