Listen to this story
Going from the prototype stage to production in AI and machine learning is one of the most challenging tasks. Research shows that over 87 per cent of data science projects never make it to production due to reasons like using various frameworks, languages, and ecosystems, alongside a shortage of specialised compute for large-scale training and variable workloads on ML serving, and needing smarter infra management.
Against this backdrop, Google, which strongly believes that the MLOps framework/platform is critical in ensuring faster and repeatable time to market for ML Models, has announced its flagship ML platform (Vertex AI). It caters to all the data science needs right from experimentation (with Managed Notebooks) and training (using scalable ML pipeline frameworks) to deployment (GKE autoscaling ++) and model management and monitoring in a systematic, scalable, and unified manner.
The global machine learning-as-a-service (MLaaS) market is expected to touch USD 302.66 billion by 2030, growing at a CAGR of 36.2 per cent from 2021 to 2030. The growth is triggered due to higher demand for cloud-based solutions, cloud computing, growth of AI and cognitive computing market, rise in adoption of analytical solutions, increased application areas, and others.
Leading the wave, Google Cloud helps customers understand their supply chain and products better with data. Moreover, they also help run applications (both existing and new applications – using an open architecture) and bring people together to improve productivity and creativity, ensuring that the systems, users, and data are protected.
Google Cloud believes in democratising ML across the organisation. In this endeavour, it provides its ML capabilities in three distinct ways:
API: Google Cloud provides APIs to companies, where they can incorporate the models directly into their applications. For example, include Video AI, Google Cloud Vision API, Speech-to-text API, Cloud Document AI API, etc.
AutoML: It helps companies to create personalised models for their data. Thanks to zero code ML capabilities in AutoML, organisations (especially business analysts) can conveniently build state-of-the-art models.
AI Platform: For organisations that want to adopt data science as their strategic differentiator, Google Cloud enables data scientists with the most sophisticated tools. Google Cloud supports all the popular ML frameworks (TensorFlow, PySpark, PyTorch, scikit-learn, etc.). Plus, companies can use TPUs to build complex models of the scale of BERT (tuning billions of parameters) in a matter of hours/days. In other words, this could run into weeks and months elsewhere.
BQML: For teams, who are well versed in SQL, BQML, Google Cloud’s built-in machine learning capabilities are unlocking its potential for millions of data analysts because it lets them build and operationalise models directly from within BigQuery, using simple SQL.
The future of MLaaS – strategic collaboration
Google Cloud has several enhancements planned in future for the ML services. The company plans to extend recommendation AI beyond retail into some of the other domains like media and entertainment. Further, they would bring in features like training to allow greater customisation of the pre-built models based on clients’ data.
Recently, Google Cloud launched its state-of-the-art TPU v4 for its enterprise clients. These high-performance chips have been driving the development of its large language models. It is now available for its clients.
The customers who have used TPU v4 recently have witnessed amazing results. For example, Erik Nijkamp, the research scientist at Salesforce, said that the access to TPU v4 enabled them to achieve breakthroughs in conversational AI programming with their CodeGen project – a 16-billion parameter auto-regressive language model that turns simple English prompts into executable code.
He said that the empirical observation drives the large size of this model and that scaling the number of model parameters proportional to the number of training samples appears to improve the model’s performance. This phenomenon is also known as the ‘scaling law.’
“TPU v4 is an outstanding platform for this kind of scale-out machine learning training, providing significant performance over other comparable AI hardware alternatives,” said Nijkamp.
LG AI research recently collaborated with Google Cloud to test TPU v4 before commercialisation. They used Google’s latest machine learning supercomputer to train LG multimodal capabilities, LG EXAONE, a super-giant AI that has a 300 billion parameters scale.
LG EXAONE was trained with TPU v4 and a huge amount of data, more than 600 billion text corpus and 250 million images – equipped with multimodal capabilities – aiming to surpass human experts in terms of communication, creativity, productivity, and others. “Not only did the performance of TPU v4 outperform other best-in-class computing architectures, but also the customer-oriented support was beyond our expectations,” said Kyunghoon Bae, chief of LG AI research.
Data engineering to multi-cloud play
Google Cloud provides technologies across AI, machine learning, analytics, and databases. It has been instrumental in helping organisations like Exabeam, Deutsche Bank, and PayPal to break down silos, increase agility, derive more value from data, and innovate faster.
In the last few years, the multi-cloud method is becoming more popular than the single-cloud approach because it provides organisations with significantly greater flexibility, capabilities, and price alternatives. At the same time, many companies believe that a multi-cloud approach is a way to address the core issues of cloud computing.
Multi-cloud is an increasingly popular approach; apart from the flexibility, capabilities and price, a couple of key drivers for adopting a multi-cloud approach are risk diversification and leveraging the best-in-class capabilities each CSP offers.
While it seems like an obvious strategic decision in designing tech architecture, one cautionary note to factor in is the integration challenges and data gravity that is inevitable in any cloud upfront in your technology choice. This helps avoid expensive re-architecture at a later stage and helps control egress costs that you may have to incur to operate across clouds effectively.
At the core of everything they do, the team believes that Google has always had ‘open’ at the heart of its design philosophy. It is core to their DNA, as also seen from their contributions to open sources such as GKE & Tensorflow. Google has over 20K projects, and 2 million lines of code contributed to open source. This philosophy has also been core to how the company has designed its cloud offerings – products like Anthos, Dataplex, BigQuery’s BQ Omni, etc., are good examples.
For those unaware, Anthos is designed to help teams operate, orchestrate and govern their apps across multiple clouds. Dataplex allows teams to govern data and build a central metadata repository irrespective of which cloud the data is on. BigQuery, one of their most successful analytics products, recently launched BQ Omni to act as a wrapper that can query data regardless of which cloud it resides on.
Google cloud is a truly customer-centric cloud provider. The company works with the client’s tech and data architecture, regardless of whether they use Google Cloud’s underlying technology or someone else’s.