Amazon recently announced the general availability of Redshift ML. It allows data scientists and developers to use SQL commands in Amazon Redshift data warehouses to create, train, and apply machine learning models.
Sign up for your weekly dose of what's up in emerging technology.
Amazon Redshift data warehouse is an enterprise-class relational database query and management system. Typically, when you execute analytic queries, you retrieve, compare, and evaluate large amounts of data in multiple-stage operations to produce a final result. Amazon Redshift achieves efficient storage and optimum query performance through a combination of massive parallel processing, columnar data storage, and efficient, targeted data compression encoding schemes.
Amazon Redshift ML
Amazon Redshift ML allows you to take advantage of Amazon SageMaker, a fully managed machine learning service, without learning new tools or languages. Simply by using SQL statements, you can create and train Amazon SageMaker machine learning models using your Redshift data to make predictions.
- No prior ML experience needed: As Redshift ML allows you to use standard SQL commands. It offers simple, optimised, and secure integration between Redshift and Amazon SageMaker, enabling inference within the Redshift cluster, making easy-to-use predictions generated by ML-based models in queries and applications.
- Use ML on your Redshift data using standard SQL: To get started, use the CREATE MODEL SQL command in Redshift and specify training data as a table or SELECT statement. Later, Redshift ML compiles and imports the trained model inside the Redshift data warehouse and prepares a SQL inference function used in SQL queries.
- Predictive analytics with Amazon Redshift: With Redshift ML, you can embed predictions like fraud detection, risk scoring, and churn prediction directly in queries and reports. Use the SQL function to apply the ML model to your data in queries, reports, and dashboards.
- Bring-your-own-model (BYOM): Redshift ML supports using BYOM for local or remote inference. Meaning, you can use a model trained outside of Redshift with Amazon SageMaker for in-database inference local in Amazon Redshift. You can import SageMaker Autopilot, and direct Amazon SageMaker trained models for local inference. Besides this, you can also invoke remote custom ML models deployed in remote SageMaker endpoints and more.
How Redshift ML works
You provide the data and metadata associated with data inputs to Amazon Redshift to train a model. Then, Amazon Redshift ML creates models via input data. Using these models, you can then generate predictions for new input data without additional costs.
To create an ML model, you use a simple SQL query to specify the data you want to train the model on and the output value you want to predict.
For instance, to create a model that predicts the success rate for your marketing activities, you define your inputs by selecting the column (in one or more tables) that include customer profiles and results from previous marketing activities and specify the output column you want to predict.
After running the SQL command to create the model, Redshift ML securely exports the specified data from Amazon Redshift to your Amazon S3 bucket and calls Amazon SageMaker Autopilot to prepare the data and select the appropriate pre-built algorithm, and apply them for model training. Here’s an example of the XGBoost algorithm (eXtreme Gradient Boosting).
Workflow of Amazon Redshift ML (Source: AWS)
Redshift ML or Redshift machine learning handles all interactions between Amazon Redshift, Amazon S3, and Amazon SageMaker. Once the model has been trained, Redshift ML uses Amazon SageMaker Neo to optimise the model for deployment and makes it available as a SQL function.
Redshift ML includes many new features including Amazon Virtual Private Cloud (VPC) support.
For example, you can seamlessly import a SageMaker model into your Amazon Redshift cluster (local inference).
Also, you can create SQL functions that use existing SageMaker endpoints to make predictions (remote inference). In this case, Redshift ML is batching calls to endpoints to speed up processing.
In terms of pricing, Amazon customers pay only for what they use. Meaning, Amazon Redshift ML uses your existing cluster resources for prediction so you can avoid additional Amazon Redshift charges. There are no additional charges for using Amazon Redshift ML.
However, it also uses Amazon SageMaker to train your model, which has an additional associated cost. Plus, there are charges for Amazon S3 for storing training data.
Redshift ML is currently available in the following AWS regions: the US (Ohio, N Virginia, Oregon, San Francisco), Canada (Central), Europe (Frankfurt, Ireland, Paris, Stockholm), Asia Pacific (Hong Kong, Tokyo, Singapore, Sydney), and South America (São Paulo).