AWS Announces Amazon Redshift ML, A Cloud-based Service For Data Scientists To Use ML Technologies

Amazon HealthLake

Recently at the AWS re:Invent event, the e-commerce giant announced the launch of Amazon Redshift Machine Learning (Amazon Redshift ML). According to its developers, with Amazon Redshift ML data scientists can now create, train as well as deploy machine learning models in Amazon Redshift using SQL. 

Amazon Redshift is one of the most widely used cloud data warehouses, where one can query and combine exabytes of structured and semi-structured data across a data warehouse, operational database, and data lake using standard SQL. The cloud data warehouse is well-known for its intuitive features, such as efficient storage, scalability, high-performance query processing, result caching and more.

Technology Behind Redshift ML

Amazon Redshift ML is powered by Amazon SageMaker, which is a fully managed ML service. Here one can use SQL statements to create and train machine learning models from data in Amazon Redshift. The models then can be used for applications like churn prediction and fraud risk, among others.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

As per a blog post, with the release of this data warehouse application, it will now support supervised learning techniques, which is most commonly used in enterprises for advanced analytics. It will allow users to use their data in Redshift without requiring any in-depth knowledge in machine learning techniques.

While working with this ML applications, one should consider the following:

Download our Mobile App

  • The new Amazon Redshift clusters must be created with the SQL_PREVIEW maintenance track. 
  • The Amazon Redshift cluster that is used to create the model and the Amazon S3 bucket that is used to stage the training data and model artefacts must be in the same AWS Region.
  • A user will not be able to switch an existing Amazon Redshift cluster from the current or trailing track to this preview track, or vice versa.

Why Use Redshift ML

Amazon Redshift is used to process exabytes of data every day to power the analytics workloads. This data can be leveraged by data scientists and analysts for training ML models. The models can then be used to generate insights into new data.

The key benefits of using Amazon Redshift ML is that it automatically detects as well as tunes the fittest model based on the training data using the Amazon SageMaker Autopilot. The SageMaker Autopilot chooses among the best regression, binary, or multi-class classification and linear models.

Besides the above-mentioned importance, there are some more interesting benefits that this application provides:

  • Amazon Redshift allows a user to create and train ML models with simple SQL commands without having to learn external tools.
  • It provides flexibility to use automatic algorithm selection.
  • The application automatically preprocesses data, and creates, trains and deploys models.
  • It enables advanced users to specify problem type and generate predictions using SQL without having to ship data outside your data warehouse.
  • It also allows data scientists to select efficient algorithms such as XGBoost and specify hyperparameters and preprocessors.

How It Works

When users run SQL commands to create the model, Amazon Redshift ML exports the specified data in a secured manner from Amazon Redshift to Amazon S3 and calls the SageMaker Autopilot to prepare the data automatically. It then selects the relevant pre-built algorithm as well as apply it for training the ML model.

According to its developers, this application manages all the intercommunications between Amazon Redshift, SageMaker and Amazon S3 while abstracting the steps involved in training and compiling. After the model is successfully trained, the Redshift ML application makes it available as a SQL function in the Amazon Redshift data warehouse by compiling it via Amazon SageMaker Neo. 

Wrapping Up

Amazon Redshift ML is a cloud-based service that makes it easy for analysts and data scientists to use machine learning technology. It doesn’t come with any additional charge for creating or using a model, and prediction happens locally in your Amazon Redshift cluster. This means the application only allows you to pay only for training; the prediction is included with the costs of your cluster, majorly driven by ML predictions. Also, the machine learning preview period is expected to run until March 31, 2021.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Foxconn Conning India?

Most recently, Foxconn found itself embroiled in controversy when both Telangana and Karnataka governments simultaneously claimed Foxconn to have signed up for big investments in their respective states