Recently at the AWS re:Invent event, the e-commerce giant announced the launch of Amazon Redshift Machine Learning (Amazon Redshift ML). According to its developers, with Amazon Redshift ML data scientists can now create, train as well as deploy machine learning models in Amazon Redshift using SQL.
Amazon Redshift is one of the most widely used cloud data warehouses, where one can query and combine exabytes of structured and semi-structured data across a data warehouse, operational database, and data lake using standard SQL. The cloud data warehouse is well-known for its intuitive features, such as efficient storage, scalability, high-performance query processing, result caching and more.
Technology Behind Redshift ML
Amazon Redshift ML is powered by Amazon SageMaker, which is a fully managed ML service. Here one can use SQL statements to create and train machine learning models from data in Amazon Redshift. The models then can be used for applications like churn prediction and fraud risk, among others.
Sign up for your weekly dose of what's up in emerging technology.
As per a blog post, with the release of this data warehouse application, it will now support supervised learning techniques, which is most commonly used in enterprises for advanced analytics. It will allow users to use their data in Redshift without requiring any in-depth knowledge in machine learning techniques.
While working with this ML applications, one should consider the following:
Download our Mobile App
- The new Amazon Redshift clusters must be created with the SQL_PREVIEW maintenance track.
- The Amazon Redshift cluster that is used to create the model and the Amazon S3 bucket that is used to stage the training data and model artefacts must be in the same AWS Region.
- A user will not be able to switch an existing Amazon Redshift cluster from the current or trailing track to this preview track, or vice versa.
Why Use Redshift ML
Amazon Redshift is used to process exabytes of data every day to power the analytics workloads. This data can be leveraged by data scientists and analysts for training ML models. The models can then be used to generate insights into new data.
The key benefits of using Amazon Redshift ML is that it automatically detects as well as tunes the fittest model based on the training data using the Amazon SageMaker Autopilot. The SageMaker Autopilot chooses among the best regression, binary, or multi-class classification and linear models.
Besides the above-mentioned importance, there are some more interesting benefits that this application provides:
- Amazon Redshift allows a user to create and train ML models with simple SQL commands without having to learn external tools.
- It provides flexibility to use automatic algorithm selection.
- The application automatically preprocesses data, and creates, trains and deploys models.
- It enables advanced users to specify problem type and generate predictions using SQL without having to ship data outside your data warehouse.
- It also allows data scientists to select efficient algorithms such as XGBoost and specify hyperparameters and preprocessors.
How It Works
When users run SQL commands to create the model, Amazon Redshift ML exports the specified data in a secured manner from Amazon Redshift to Amazon S3 and calls the SageMaker Autopilot to prepare the data automatically. It then selects the relevant pre-built algorithm as well as apply it for training the ML model.
According to its developers, this application manages all the intercommunications between Amazon Redshift, SageMaker and Amazon S3 while abstracting the steps involved in training and compiling. After the model is successfully trained, the Redshift ML application makes it available as a SQL function in the Amazon Redshift data warehouse by compiling it via Amazon SageMaker Neo.
Amazon Redshift ML is a cloud-based service that makes it easy for analysts and data scientists to use machine learning technology. It doesn’t come with any additional charge for creating or using a model, and prediction happens locally in your Amazon Redshift cluster. This means the application only allows you to pay only for training; the prediction is included with the costs of your cluster, majorly driven by ML predictions. Also, the machine learning preview period is expected to run until March 31, 2021.