Last updated November 7, 2021
In AI Origins & Evolution

Databricks Breaks Data Warehousing Performance Record

An official blog that Databricks SQL has set a new record in 100TB TPC-DS by outperforming the previous best by 2.2 times.

Published on November 7, 2021

by Shraddha Goled

San Francisco-based data warehouse and data technology company Databricks announced that it had created a world record for data warehouse performance. The company announced on an official blog that Databricks SQL has set a new record in 100TB TPC-DS by outperforming the previous best by 2.2 times. 100TB TPC-DS is a gold standard performance benchmark for data warehousing. The result has been formally audited and reviewed by the TPC council.

New Record Created

Barcelona Supercomputing Center’s team corroborated the results of the new record. The group routinely runs TPC-DS on popular data warehouses. The group of researchers benchmarked Databricks and Snowflake and found that the former was 2.7 times faster and 12 times better than the latter in terms of price performance.

Defined by the non-profit organization Transaction Processing Performance Council (TPC), TPC-DS is a data warehouse benchmark where DS stands for decision support. It includes 99 queries of varying complexities that include simple aggregations to complex pattern mining. It was introduced in mid-2000 to reflect the growing complexity of analytics. Since then, almost all vendors have adopted TPC-DS as the de facto standard for data warehouses.

It is not very likely to pass the official benchmark as it considers various parameters. Databricks in its blog claims that several established vendors often tweak official benchmarks to demonstrate the better performance of their systems. The tweaks include removing certain SQL features like rollups and removing skew by changing data distribution. “The tweaks also ostensibly explain why most vendors seem to beat all other vendors according to their own benchmarks,” Databricks has claimed.

Databricks managed to address the following challenges:

Open vs proprietary data formats: It is argued that data warehouses that leverage proprietary data formats can evolve quickly compared to those that rely on open formats (for example — Databricks, which is based on Lakehouse, doesn’t change as quickly.) Databricks argues that the open format has its own advantages like the scope for standardization, defying vendor lock-in, and allowing tools to be developed independently of any vendor. The company also says that it is possible for open formats to evolve, case in point, Parquet, which has undergone several stages of iterations.

Architecture: Databricks doesn’t employ the Apache Spark-based MPP architecture, which is considered superior for SQL performance; instead, the Databricks SQL is based on Photon. It is built for SIMD architecture and does heavy parallel query processing. Photon can be considered as an MPP engine.

Throughput vs latency trade-off: Databricks has built some of its key enabling technologies built on Photon, Delta Lake, etc., which have improved the performance of both large and small queries.

Time: It is traditionally believed that it takes at least a decade or so for a database system to mature. Databricks managed to do it much faster due to factors like investing in various technologies that would support SQL workloads and benefit AI workloads on Databricks; use of SaaS model that accelerates software development cycle; better capital allocation.

Wrapping up

Notably, Databricks has been advancing its data warehousing capabilities. In November 2020, the company announced its full suite of data warehousing capabilities as Databricks SQL. The company says that the initial doubts about whether an open architecture based on a Lakehouse can offer the classical data warehouse’s performance, speed, and cost have been rubbished with the latest performance test.

The blog further stated that the company had assembled the best team on the market that is working to deliver the ‘next performance breakthrough’. The company is also working on a number of improvements on ease of use and governance.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

How Databricks is Enabling Agriculture’s Data Revolution with UPL

Databricks Launches Data Intelligence Platform for Energy Sector

How Databricks is Shaping Ola Krutrim’s AI Dreams in India

Databricks Creates History with GPT-4-Level Open-Source Model

Krutrim, Bhavish Aggarwal’s AI Unicorn, Partners with Databricks

Databricks Sees Over 80% Growth in India Amidst AI Demand Surge

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

The Impact of Lok Sabha Election on India’s AI Progress

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

The model is available on Hugging Face.

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI

Is it Humane to Bash Humane Ai Pin?

“People don’t know what they want until you show it to them.”

Meta Llama 3 Now Available on Databricks For Enterprise

Llama 3 models are now also rolling out on Amazon SageMaker, Google Cloud, Hugging Face,

How Good is Llama 3 for Indic Languages?

“Llama 3 Dhenu = Mom, bring 3 cows”

OpenAI Hires Pragya Misra As Its First Employee in India

OpenAI is also looking to set up a local team in India.

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

The 7B models outperforms Gemma and Mistral on all benchmarks and the 70B model outperforms

India is Making its Own AI Servers

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Even Meta’s open-source model, Llama 3, with 400B (the GPT-4 equivalent), has not been released for similar reasons.

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Infosys Feels Good About Its Work with Generative AI

Mohit Pandey

Top Editorial Picks

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Siddharth Jindal

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the