When an ML model has been deployed for production after being built and then fine-tuned using historical data, it steps into the inference phase. It is when a model is put into action based on live data to calculate an outcome. According to Amazon, inference in the machine learning lifecycle accounts for up to 90 per cent of the total computing cost. When Amazon launched its serverless inference service in December at Amazon’s re:Invent event last year, it wanted to help clients deploy machine learning models for inference without them having to configure or manage the underlying infrastructure.
Source: AWS
Benefits of serverless inference
SageMaker, Amazon’s fully managed ML service, aims to help in use cases where traffic patterns are unreliable. The big claim that SageMaker has is that it reduces the total cost of ownership (TCO) involved. When users deploy machine learning models for inference using SageMaker, they won’t have to configure or manage the underlying infrastructure. On the basis of the number of inference requests, SageMaker can automatically offer and scale compute capacity. The feature was introduced in the preview last year. Last week at the AWS Summit in San Francisco, SageMaker’s serverless inference was announced as generally available (GA).
SageMaker is beneficial for organisations that have no infrastructure management or in case they want to avoid dealing with AutoScaling or instance management. Adopting serverless inference also reduces operational overheads by a big margin. According to an AWS report, SageMaker offers the most cost-effective option for end-to-end machine learning support with 54 per cent lesser TCO (total cost of ownership) than its alternatives for more than three years.
The service is also able to turn off compute capacity entirely when not in use so that the user is not charged. Amazon appears to be growing its range of serverless offerings. AWS now offers four options for inference: Serverless Inference, Real-Time Inference for workloads where low latency is a requirement, SageMaker Batch Transform that works with batches of data and SageMaker Asynchronous Inference that works with large payload size workloads that need longer time for processing. Among the list of announcements that AWS made at re:Invent last year, they also launched the AWS Inference Recommender.
Serverless Inference can also be used for ML model deployment regardless of whether SageMaker has trained it. Users can also make use of other features that serverless inference provides, such as built-in metrics like invocation count, faults, latency, host metrics and errors in Amazon CloudWatch.
Source: AWS
SageMaker has pushed the maximum concurrent invocations per endpoint limit to 200 now so that it can function even with high-traffic workloads, which wasn’t a possibility earlier. The new service can be availed in any AWS region that SageMaker is available in, with the exception of AWS GovCloud, which is reserved for the US Government, and AWS China.
Boom in AWS SageMaker
The demand for SageMaker has gained importance as companies struggle with adopting AI. While there seems to be a general understanding that employing AI was necessary, the difficulties that companies faced during the process of building and deploying models were holding them back. Quite simply, the biggest obstacle for companies is moving models into production.
An annual study published by O’Reilly showed that just 26 per cent of organisations have AI projects in production, the same rate as last year. The report also stated that 31 per cent of organisations were not using AI currently, an increase from the 13 per cent reported last year. This demonstrated a flattening trend in the growth graph of AI employment in organisations.
According to a report by Gartner, only 53 per cent of projects move from pilot into production – the time taken to create scalable models is eight months on an average.
Having AutoPilot functionality can’t be the sole solution to erasing these bottlenecks. An AutoPilot also needs to automatically inspect raw data, select the most appropriate features and choose the best algorithms. SageMaker’s built-in AutoPilot can be used by DevOps teams to improve model tuning and data accuracy.
SageMaker is designed to integrate AI services, ML frameworks and infrastructure in the middle of the AWS-ML stack. It can adapt itself to changing model building, training and deployment.
Source: Research on ‘Risks checked for during development’
Despite the improvement in features made by AWS, a survey conducted among Kaggle data scientists showed that AWS (48.2 per cent) was still beating usage for SageMaker (16.5 per cent) by a wide margin, considering AWS’ direct access to EC2. The only competitor that resembles Serverless Inference is Google Cloud’s Vertex Pipelines. And even though Google Cloud was placed third behind Microsoft Azure and AWS, Kaggle voted it a strong second. This indicates that despite the apparent advantages of SageMaker’s obvious advancements, there is ground that still remains to be covered.