MITB Banner

Everything You Should Know About The New Google Dataset Search: A Review

Share

Yet again proving its monopoly and expertise in search, Google recently introduced another engine — the Google Dataset Search. As a one-of-its-kind dataset search engine, it aims to be a one-stop destination for researchers in the field of artificial intelligence, data science, machine learning, data mining and many others.

This search engine would essentially unearth publicly available data to aid researchers and can be used by anyone from scientists and data journalists to data geeks who are keen on exploring the field.

This Is The Right Time For Google To Have Launched This Search Engine

We all know of a fact that for AI and ML algorithms to work and predict accurate results, we need to feed them with data. The larger the amount of data, the better is the result. But the biggest question is — where to get the data from?

Before Google Database Search Engine, there were other resources where data scientists or AI researchers could hunt for datasets. But they were usually scattered, difficult to find and often a tedious process. The popular sources were search repositories such as UCI Machine Learning where most of the datasets were uploaded, Github, Kaggle or simply scrolling through various links on Reddit. The other popular resources were domain-wise searches like data related to climate, banks; or looking into specific websites such as NASA, World Bank or universities.

It was a huge challenge to find all the data curated in one place. Even if companies or universities were willing to provide their datasets, there was a lack of proper indexing and common source to access all the datasets in a go.

With the growing popularity in the area of AI and ML, Google’s new search engine aims to help researchers have a quick and easy access to expedite the research process, especially for early-career researchers who are not already plugged into a network of professional connections.

How Does It Work?

Individuals, institutions or organisations who wish to publish their data online would need to include metadata tags in their web pages. This could be the description of that data, name/s of the creator/s, date of publishing, terms of usage, the source of data collected, and so on. This information is then indexed by Dataset Search and combined with inputs from Google’s Knowledge Graph.

The aim is to make data easy discoverable while keeping it where it is. Natasha Noy, Research Scientist at Google AI explains in a blog post that dataset search lets a user find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page.

Google has made a specific set of guidelines for the users to follow, which makes it easy and convenient to understand the content of their pages. Google encourages dataset providers, large and small to adopt this common standard so that all datasets are a part of this robust ecosystem.  

Our Review

Pros:

  • One-stop Solution For All Data Needs: Searching datasets was never this easy. It has obviously revolutionised the way researchers can access data, as Google Dataset Search sources and locates all data that is freely available. It is the only dataset search engine so far.
  • Promote Open Data Culture And Boosting Innovation: Since it brings free data together, it will contribute significantly towards the open-data movement, making it available for use and re-use to bring innovations in the field. It will prove to be extremely helpful for both beginners trying to learn algorithms or advanced researchers working on a new AI.
  • Credibility: Given the fact that Google has been a pioneer and one of the best search engines so far in the market, the credibility-factor for the newly launched dataset search engine is quite high. With this move, Google has hit the spotlight again and are clearly here to win.
  • Ease Of Visibility: For those willing to share their dataset, they can easily do so just by marking correctly with specific metadata. All they have to do is use the right description and be visible to the large AI and data science community.  

Cons:

  • The Google dataset search engine is still in beta and will get evolved over the time. While users can still search with a tag or a keyword, and it will enlist available datasets, since it is in beta phase, as of now it cannot be said if it is a perfect search engine or not.
  • The main drawback here is that dataset if uploaded with less or obsolete description, it will be very hard to locate. Therefore publishers must and should give the right information about their dataset. If the descriptions are not clear, it might get lost in the pool with no visibility at all in the search engine. The better the info, high chances of it to pop up in the search.

In a nutshell, Google Dataset Search is an amazing creation by Google and would definitely help the research community in a big way by promoting innovations like never before.

PS: The story was written using a keyboard.
Share
Picture of Srishti Deoras

Srishti Deoras

Srishti currently works as Associate Editor at Analytics India Magazine. When not covering the analytics news, editing and writing articles, she could be found reading or capturing thoughts into pictures.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India