How Google’s BigQuery ML Is Empowering Data Analysts

In case you were wondering, here’s another sign of the Google Cloud Vs Amazon Web Services war heating up. Google has now brought in the big guns in the analytical data warehousing space with by embedding machine learning capabilities into Google BigQuery. Google BigQuery is an analytics service, low-cost enterprise data warehouse which has now been rebranded as BigQuery ML.

One of the key features of BigQuery is that it transforms SQL queries into complex execution plans, dispatching them onto execution nodes to promptly provide insights into the data. BigQuery enables developers to execute SQL as a massively parallel processing query with hundreds of CPU cores and ample disk storage, scanning and aggregating terabytes of data in seconds. BigQuery ML, a capability inside BigQuery enables analysts and data scientists to build and deploy ML models on massive structured or semi-structured datasets.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Dremel Technology Is The Key

At a time when Hadoop was facing intense competition, Google released beta access to BigQuery, a new SQL processing system based on Dremel technology, a distributed query engine in 2011. And BigQuery provides the core set of features available in Dremel to third-party developers. A key point of Dremel technology was its cost-to-value ratio. It can scan 35 billion rows without an index in tens of seconds and there is no capital expenditure required on the user’s part for the supporting infrastructure, a technical paper reveals.

Dremel, the cloud-powered massively parallel query service, is immensely popular among developers, Amazon Redshift remains the leader in data warehouse space in terms of performance, cost and usability. Even though they are both columnar data warehouses, BigQuery scores in its Tree Architecture of Dremel which is used for dispatching queries and aggregating results across thousands of machines in a few seconds. Both Amazon Redshift and BigQuery are based on columnar storage, which makes them best for analytics workload, as opposed to relational databases like Postgres and MySQL.


Download our Mobile App



Are The ML Capabilities Giving An Advantage Over Traditional Data Warehouses?

Since ML requires programming and knowledge of ML frameworks, it keeps data analysts out and restricts the use of ML to a small set of users, mainly data scientists. Now, BigQuery ML enables data analysts to leverage ML through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Since queries can be done directly against the BigQuery database, no additional extract, transform, and load (ETL) tools are required, Rajan Sheth, senior director of Product Management at Google said during the Google Next 2018 conference.

Key Advantages

Cleaning And Preprocessing Data In SQL: Users can create ML models in BigQuery with SQL queries. For example, if analysts want to train a logistic regression model, they can do directly in BigQuery ML, which means you can slice your data and also explore different processing options. One user on a forum pointed out that in most cases, developers don’t require to train neural networks, especially for structured data. In this case, cleaning and preprocessing data is relatively smooth in SQL.

More Power To Data Analysts: It gives more power to data analysts who know SQL but don’t have much knowledge of ML frameworks to develop models without any programming knowledge or leveraging additional tools.

Democratises ML: BigQuery ML democratises ML by allowing developers to build models using their existing tools and to increase development speed by eliminating the need for data movement.

Reduced Waiting Time: BigQuery ML significantly increases the speed of model development by eliminating the function of exporting data from the data warehouse. Instead, BigQuery ML brings ML to the data. Analysts no longer need to export small amounts of data to spreadsheets or other applications. Also, the documentation emphasises there is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language data analysts know. Since BigQuery is designed to run queries on Big Data in as little as a few seconds, it is best suited for querying for large datasets. However, currently, BigQuery ML only supports two types of models — linear regression for forecasting and logistic regression used for classification purpose.

Conclusion

Of late, Google BigQuery has emerged as the next viable option after Amazon RedShift, and it is suitable for both OLAP and BI use cases. In fact, 20th Century Fox tested the beta to understand its movie marketing data by running a SQL query for audience analysis, that was appended with a “create model” statement. Google BigQuery ML returned a linear regression model against the query, thereby effectively predicting who would want to see a soon to be released movie. This data was used to reformulate the media planning for the movie.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Richa Bhatia
Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: From Promise to Peril: The Pros and Cons of Generative AI

Most people associate ‘Generative AI’ with some type of end-of-the-world scenario. In actuality, generative AI exists to facilitate your work rather than to replace it. Its applications are showing up more frequently in daily life. There is probably a method to incorporate generative AI into your work, regardless of whether you operate as a marketer, programmer, designer, or business owner.