Last updated October 12, 2018
In AI Origins & Evolution

In-database analytics: The next “in” thing for Big Data

Published on April 15, 2016

by Partha Sen

For those who hit the “snooze” button on earlier Big Data wake-up calls, consider this your espresso shot of information: Our society creates as much data in two days—approximately five exabytes of data—as all of civilization did prior to 2003. In light of that statistic, maybe they should change Moore’s Law to A Little Moore’s Law out of respect.

The Current Challenge: Too Much, Too Late

A few examples illustrate the magnitude of today’s data challenge. If an average-sized healthcare insurer (with approximately 30 million customers) wishes to improve the outcomes for diabetic patients, they may need to analyze more than 60,000 medical codes across 10 billion claims and factor a separate silo of pharmacy data into the equation. The challenge is no less daunting in other industries. A national retail chain that wants to improve its product replenishment could be looking at sales data from thousands of separate stock keeping units (SKUs) across hundreds or thousands of stores over the last several years—that’s more than 100 billion rows of data.

For years, data analysis has been, in a sense, a “moving” experience. Enterprises moved the data that they wanted to analyze from their database onto analytic servers in order to break the analytic work into smaller pieces. Many enterprises, in fact, still do this today. The problem with this approach is several-fold. As data gets bigger—a foregone conclusion today—it can take hours (or even longer) to transfer and stage the data on multiple servers, then return it to the database and re-assemble it. For time-sensitive analyses, that’s a deal breaker. As a workaround, enterprises often choose to analyze only a subset of their data, but this sort of data sampling leads to less than ideal analytic models and can ultimately create more data confusion by generating multiple versions of the same information.

Move the Analytics, Not the Data

To solve the current challenges of Big Data, enterprises are turning to a new strategy: in-database analytics. The idea behind this approach can be summed up in one simple concept: Move the analytics, not the data. By bringing the analytics engine into the database and leveraging massively parallel map-reduce technology (popularized by tools like Hadoop), enterprises can perform highly complex analyses directly in the database environment without spreading the problem out across a team of servers. In-database analytics yield a host of benefits over traditional analytics including:

Faster analytics, on a scale of 10-100X faster than traditional analytics
Better analytic models as data scientists can now use full datasets versus sampling
No data duplication errors caused by moving data between servers
Stricter security policy enforcement, particularly for industries that regulate the movement of sensitive business data
Near real-time insights, as opposed to analytic insights that may be days or weeks old
Capex reduction by eliminating the need for additional hardware servers to process the analytics
Pervasive analytics that can flow freely to reporting tools and applications throughout the enterprise

It’s Data Science, Not Rocket Science

As data has grown, so has the role of the “data scientist” to that of an atlas of analytics. The data scientist, according to legend, is intimate with machine learning and statistics, can program in low-level languages (often with one hand), is a data domain expert and is an artist where data visualization is concerned. He or she can tease insights from otherwise unrecognizable patterns and, in some cases literally, can predict the future. Not surprisingly, this mythical unicorn comes with a commensurate price tag.

The problem with this model, beyond cost and scarcity, is a tendency to isolate innovation. In-database analytics solves this problem by reducing the complexity of analytics so that teams of “regular” data analysts can access and analyze data using familiar SQL queries. This essentially democratizes data-led discoveries in the enterprise and, to the relief of HR departments everywhere, eliminate the need to slather themselves in unicorn perfume just to attract the right talent.

That’s Nice, But Who Cares?

In-database analytics isn’t ideal for everyone. For example, some of the newer business cases for Big Data that require analysis of large sets of unstructured data are better served by tools designed specifically for those scenarios. But any business where large amounts of structured data need to be analyzed quickly can benefit from in-database analytics. Good candidates for in-database analytics include:

Healthcare organizations that need to analyze large amounts of patient data securely
Financial services companies that can benefit from real-time decision making in their investment strategies
Retail corporations that need to improve supply chain logistics or analyze product performance in a dynamic environment

Over the next 18 months, we expect to see more enterprise analytics forego the data server farms of the past and move into the data warehouse. Higher performance and lower cost are the most important drivers for the move home, but there are other benefits to consider. As already cited, in-database analytics allows enterprises to use what they have today in terms of IT skills and infrastructure while dramatically improving their analytic capabilities. Also, the relative simplicity of in-database analytics allows enterprises to experiment more with their data analysis and test hypotheses that would have been impractical or excessively expensive in a traditional setting.

At least, from my perspective, all signs point to in-database analytics becoming the next “in” thing for Big Data.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Partha Sen

A passion for solving complex business problems using quantitative methods, data mining and pattern recognition began as a hobby before leading Partha Sen to found Fuzzy Logix and develop its flagship product, DB Lytix, in 2007. Before Fuzzy Logix, Partha held senior management positions at Bank of America where his achievements included leading the initiative to build a quantitative model driven credit rating methodology for the entire commercial portfolio. In the portfolio strategies group, Partha led a team to devise various strategies for effectively hedging the credit risk for the bank’s commercial loan portfolio and for minimizing the impact of mark-to-market volatility of the portfolio of hedging instruments.

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.