Is Facebook Latest NeuralDB An Advancement Over Conventional Database Models

The query processing technique is built on state-of-the-art NLP methods and promises several benefits over conventional databases.

Published on August 31, 2021

by kumar Gandharv

Traditional database systems require preset schemas and can answer queries with well-defined semantics written in structured query language (SQL). To be precise, having structured data becomes a prerequisite. As a result, a vast collection of unstructured data available remains idle and taking any advantage of it remains a challenge. Overcoming the same, Facebook AI has unveiled NeuralDB – enabling machines to search unstructured datasets. The query processing technique is built on state-of-the-art NLP methods and promises several benefits over conventional databases, including:

First, the system comes with no predefined schema, thereby eliminating the need to define the scope of the database in advance, and any relevant data can be stored and queried.
Secondly, it simplifies the process for a user and allows them to pose updates and queries in a variety of natural language forms.
Thirdly, NeuralDB is based on a pre-trained language model already loaded with lots of knowledge.

Presented Model

Facebook AI researchers including James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, and Alon Halevy have presented the model.

The architecture proposed in the paper showcases the adaptation of transformer models to answer queries in simple natural language. These models, however, fail to perform well on aggregation queries and have scalability issues. Overcoming these limitations, researchers ran multiple instances of a Neural SPJ operator in parallel. Underlying this architecture is a novel algorithm for generating the small sets of database sentences fed to each Neural SPJ operator. Below is an overview of NeuralDB architecture:

Image Credits: Paper

Atomic and composite are two types of sentences. Consider, for example, ‘Richard loves to fly’ is an atomic sentence (sentences corresponding to a single fact). On the other hand, ‘Radhika is married to Krish, and they have three kids’ (sentences corresponding to multiple facts) is a composite one. The focus of the paper, as described by researchers, is majorly on atomic sentences. “The design of the NeuralDB architecture was based on a careful examination of the strengths and weaknesses of current NLP transformer models. Our experimental results suggest that it is possible to attain very high accuracy for a class of queries that involve select, project, join possibly followed by aggregation,” concluded the researchers.

Read the paper here.

Why Unstructured Data Matters

Till now, it is challenging to store unstructured data such as photos, video files, audio files, social media sites, presentations, call recordings, or transcripts, in a traditional row-column database, or as a spreadsheet in Microsoft Excel; hence, is a kind of headache for firms. Moreover, global data generation is expected to reach more than 180 zettabytes by 2025 from 64.2 zettabytes in 2020, thereby registering a growth of over 180 per cent, as per the industry report. Unstructured data accounts for 80 to 90 per cent of data generated and gathered by businesses, and its volume is continuously increasing – several times faster than that of structured databases. Many of their uses cases exist, including:

Consumer analytics: Companies employ artificial intelligence to discover trends in unstructured data from various sources, such as call centre transcripts, online product evaluations, chatbot dialogues, and social media mentions, to make quick decisions that can improve customer relationships.
Marketing intelligence: Decision-makers can understand what products or services are most enticing for their target market by swiftly scanning large datasets and finding patterns in customer behaviour. This is useful for product development as well as determining which marketing campaigns are the most effective.

The rarity of unstructured data is not the problem; instead, it is the lack of tools and technologies to generate valuable business insights from this massive pool of digital resources. New tools for analysing these and other unstructured sources have become accessible now. Powered by AI and machine learning, these platforms operate in near real-time, including:

Utilisation of natural language processing to extract the meaning of business documents, mobile and communication data, journal articles, emails, and social media posts.
Employment of pattern recognition algorithms for identifying people and other objects in catalogues of digital images.
Speech-to-text conversion turns audio speech into searchable text.

To wrap up, it should be advisable for organisations to shift away from data silos and even the data lake storage paradigm to reap the full benefits of the marriage between unstructured data and AI.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

kumar Gandharv

Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

Wayve AI Introduces LINGO-2, Making Driving Easy with Natural Language

In 5 Years, Coding will be Done in Natural Language

Democratize data analysis and insights generation through the seamless translation of Natural Language into SQL queries

Did OpenAI Purposely Discontinue its AI Classifier?

6 Best Libraries and Frameworks for SCM Use Cases

First Trillion Parameter Model on HuggingFace – Mixture of Experts (MoE)

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

Sukriti Gupta

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the