Microsoft’s Move To Launch ‘Research Open Data’ Is A Revolutionary Way To Compete With Google And AWS

As companies rush to embrace the open source ecosystem, mainstream enterprises like Google and Microsoft are jumping on to this wave. Dubbed as an excellent open data effort by one of the leading cloud providers, Microsoft is striving hard to gain developer and community trust by embracing the open data movement. Earlier last week, Microsoft’s director of Data Science Outreach, Vani Mandava wrote about launching Microsoft Research Open Data – a cloud data repository. They plan for it to be an excellent collection of free datasets to push state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences. The datasets are available in several categories like:

  1. Biology
  2. Computer science
  3. Engineering
  4. Information science
  5. Mathematics
  6. Physics
  7. Social sciences

(To find out more about the datasets, click here.)


Sign up for your weekly dose of what's up in emerging technology.

Why Did Microsoft Decide To Release Their High-quality Data In Public Domain?

Now, publicly available datasets can be used to solve some of the most pressing big data problems. Through this open-source model for sharing datasets, Microsoft joined the league of big tech firms such as Public Datasets on AWS, Google Public Datasets, Google Custom Datasets and Twitter Datasets that are freely available. According to Mandava, Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research.

However, for public datasets to be useful for research, they have to be continuously updated and Mandava indicates the company will continue adding to its repository and include features based on feedback from the community. Sam Madden, Professor at Massachusetts Institute of Technology was quoted in the post, “This is a game-changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”.

Download our Mobile App

Some of the key features of Microsoft’s repository are that the data meets the highest standards for sharing publicly, is easily accessible, interoperable, reusable and it does not contain any personally identifiable information.

Economic Value In Open Sourcing Datasets

  • Open sourcing datasets will ease the burden of those looking for specific types of data set.
  • One underlying idea is that the increased transparency will help to create trust in users and developers, as well as offer a way to create new services based on the collected data.
  • Open sourcing datasets can also be an effective tool in enabling greater transparency and weeding out gaps in datasets.
  • Open sourcing data is the best way of fueling economic growth and innovation, and also useful for building data-driven products.

What Does Microsoft Stand To Gain From This?

Deloitte UK report emphasises that open sourcing data is a revolutionary way to compete and has a massive potential to generate a great ROI. By open sourcing their datasets, Microsoft will be benefiting the academic research community and enable the developer community. Open datasets will mobilise and strengthen the academic exchanges and cooperation. But doesn’t open sourcing endanger the company’s competitive advantage? On the other hand, open sourcing datasets is a great way to unshackle the data monopoly led by tech conglomerates and establish more transparency. It also reinforces the Microsoft’s tech-for-good mantra which they have been working on ever since Satya Nadella took the reins. By positioning themselves as enablers of an open source ecosystem, Microsoft is also driving cloud adoption — open source datasets and related software is an easy route to push the developer base towards their Azure-based data science virtual machine. Interestingly, the Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners, notes the blog. It is also an excellent way to foster AI talent and collaborate with the wider community.


Open data drives growth and innovation in this age where businesses and startups are at a tipping point and governments are making a serious attempt at building critical mass for AI-led transformation. Open data repositories can foster transparency, cement the position of tech giants as contributing to the open source ecosystem and helps startups and businesses use the data to build ground-breaking applications. According to a Deloitte report, big tech giants can work with governments to establish new paradigms in data governance. Another upside for leading IT bellwethers is that they can reap a lot of economic value from open sourcing proprietary dataset – for example by making it publicly available, data can be combined from other sources and at the same time drive cloud adoption. Besides fostering the academic research community, it will also help leading businesses like Microsoft collaborate effectively with their partners.

More Great AIM Stories

Richa Bhatia
Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.

AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Top BI tools for Mainframes

Without BI, organisations will not be able to dominate with data-driven decision-making but focus on experiences, intuition, and gut feelings.