Active Hackathon

AWS Announces Amazon Redshift ML, A Cloud-based Service For Data Scientists To Use ML Technologies

Amazon HealthLake

Recently at the AWS re:Invent event, the e-commerce giant announced the launch of Amazon Redshift Machine Learning (Amazon Redshift ML). According to its developers, with Amazon Redshift ML data scientists can now create, train as well as deploy machine learning models in Amazon Redshift using SQL. 

Amazon Redshift is one of the most widely used cloud data warehouses, where one can query and combine exabytes of structured and semi-structured data across a data warehouse, operational database, and data lake using standard SQL. The cloud data warehouse is well-known for its intuitive features, such as efficient storage, scalability, high-performance query processing, result caching and more.


Sign up for your weekly dose of what's up in emerging technology.

Technology Behind Redshift ML

Amazon Redshift ML is powered by Amazon SageMaker, which is a fully managed ML service. Here one can use SQL statements to create and train machine learning models from data in Amazon Redshift. The models then can be used for applications like churn prediction and fraud risk, among others.

As per a blog post, with the release of this data warehouse application, it will now support supervised learning techniques, which is most commonly used in enterprises for advanced analytics. It will allow users to use their data in Redshift without requiring any in-depth knowledge in machine learning techniques.

While working with this ML applications, one should consider the following:

  • The new Amazon Redshift clusters must be created with the SQL_PREVIEW maintenance track. 
  • The Amazon Redshift cluster that is used to create the model and the Amazon S3 bucket that is used to stage the training data and model artefacts must be in the same AWS Region.
  • A user will not be able to switch an existing Amazon Redshift cluster from the current or trailing track to this preview track, or vice versa.

Why Use Redshift ML

Amazon Redshift is used to process exabytes of data every day to power the analytics workloads. This data can be leveraged by data scientists and analysts for training ML models. The models can then be used to generate insights into new data.

The key benefits of using Amazon Redshift ML is that it automatically detects as well as tunes the fittest model based on the training data using the Amazon SageMaker Autopilot. The SageMaker Autopilot chooses among the best regression, binary, or multi-class classification and linear models.

Besides the above-mentioned importance, there are some more interesting benefits that this application provides:

  • Amazon Redshift allows a user to create and train ML models with simple SQL commands without having to learn external tools.
  • It provides flexibility to use automatic algorithm selection.
  • The application automatically preprocesses data, and creates, trains and deploys models.
  • It enables advanced users to specify problem type and generate predictions using SQL without having to ship data outside your data warehouse.
  • It also allows data scientists to select efficient algorithms such as XGBoost and specify hyperparameters and preprocessors.

How It Works

When users run SQL commands to create the model, Amazon Redshift ML exports the specified data in a secured manner from Amazon Redshift to Amazon S3 and calls the SageMaker Autopilot to prepare the data automatically. It then selects the relevant pre-built algorithm as well as apply it for training the ML model.

According to its developers, this application manages all the intercommunications between Amazon Redshift, SageMaker and Amazon S3 while abstracting the steps involved in training and compiling. After the model is successfully trained, the Redshift ML application makes it available as a SQL function in the Amazon Redshift data warehouse by compiling it via Amazon SageMaker Neo. 

Wrapping Up

Amazon Redshift ML is a cloud-based service that makes it easy for analysts and data scientists to use machine learning technology. It doesn’t come with any additional charge for creating or using a model, and prediction happens locally in your Amazon Redshift cluster. This means the application only allows you to pay only for training; the prediction is included with the costs of your cluster, majorly driven by ML predictions. Also, the machine learning preview period is expected to run until March 31, 2021.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.