Open datasets have only now started becoming available for researchers, analysts, professionals and students to carry out various projects and research. In new tech fields like analytics, machine learning and artificial intelligence, there is a constant need for datasets to perform tasks like planning projects, building models or using it for education.
While there are several readily-available datasets provided by companies, communities, government bodies, government institutions and others, there is often a need to clean these data before it is usable. These datasets can be accessed either freely (which most government websites are) or by paying nominal charges. We are listing here 10 such India-based datasets from the public domain and government bodies which can come in handy for your next projects.
(The list is in no particular order.)
The RBI database is a website launched by Reserve Bank of India and has data on the macroeconomic indicators of the Indian economy. It is loaded with relevant information and data for researchers, analysts, and general users all alike. It has datasets across money and banking, financial markets, national income, saving and employment, and others. The idea is to facilitate contemporary styles of data analysis that can provide important real-time numbers about economic activity, prices and more.
This is the dataset provided by MOSPI, a Union Ministry concerned with the coverage and quality aspects of statistics released. The datasets are collected by conducting large-scale sample surveys across India for various parameters, which eventually leads to the creation of the database. The ministry applies standard statistical techniques and extensive scrutiny and supervision to enable this.
An initiative by ISRO, the open data archive provides free satellite data, products download facility and thematic datasets. It uses a crowdsourcing approach to collect enriching and point-of-interest data. It also acts as a platform to host government data such as forest department. Apart from being a repository of data, it allows users to explore the 2D and 3D representation of the surface of the earth, pest surveillance, disaster services, high-resolution imagery of cities, among others.
A web portal for India citizens, it was developed by the Indian government with an objective of facilitating a single window access to information and services of all government entities. It is designed and developed jointly by National Informatics Centre (NIC), Ministry of Electronics & Information Technology. A single point access to a lot of information, it has a searchable contact directory, a database of the government website, and others.
India’s central engineering agency the Survey of India is in charge of mapping and surveying, under the Department of Science and Technology, and is one of the oldest scientific departments. With data centres spread across India, it has user-focused, cost-effective, reliable and quality geospatial data from across India.
With datasets for various meteoroid indicators, water resource planning, rainfall, and others from across various parts of India, these datasets are available for users in simple formats. It also contains databases for several other parameters such as temperature, pressure, relative humidity, precipitation amount, wind speed, solar radiation, among others.
This provides a huge database generated by the daily count of total registrations, enrolment applications accepted and rejected by state and district. It also contains other details such as Aadhaar generated by age, gender, etc.
ICEGATE or the Indian Customs Electronic Commerce/Electronic Data Interchange (EC/EDI) Gateway is a portal with e-filling services for trade and cargo carriers. It also has an exhaustive National Import Database (NIDB) and Export Commodity Database (ECDB) for Directorate of valuation that is being handled by the ICEGATE. It has information such as documents, messages, and others processes by the customs’ end by the Indian Customs EDI System (ICES).
Set up by the National Informatics Centre (NIC) in compliance with the Open Data Policy (NDSAP) of India, OGD platform gives access to government-owned shareable data along with its information about its usage in an open and machine-readable format through a wide area of network across the country. A part of the Digital India initiative, it has been developed by using Open Source Stack. It publishes datasets, documents, tools and applications collected by government for public use and community participation of the products with visualisation, APIs, alerts etc. It is also a collection of all the government based datasets discussed above.
An autonomous institution under the Ministry of Environment Forest and Climate change, Government of India, it has datasets on different wildlife species in India. There are a total of 4591 specimens that are housed at WII herbarium, of which 4322 are digitised and published through the GBIF network. The data is mainly used by researchers and field managers from the respective protected areas of the country to prepare for management plan and other research.
Register for our upcoming events:
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Our annual ranking of Artificial Intelligence Programs in India for 2019 is out. Check here.
Provide your comments below
What's Your Reaction?
Srishti currently works as Associate Editor for Analytics India Magazine. When not covering the analytics news, editing and writing articles, she could be found reading or capturing thoughts into pictures.