In his keynote speech for the first edition of the Data Engineering Summit (DES ‘22) hosted by Analytics India Magazine, Google Cloud India’s Subram Natarajan, the director of customer engineering, spoke about how cloud infrastructure can be transformative in a company’s journey. Natarajan discussed the evolution of cloud computing, mentioning the two major shifts in data transformation. The first phase was when VM Cloud was used to spin off the infrastructure requirements for organisations on to cloud. The second phase of the shift didn’t just restrict itself to the basic infrastructure frameworks but moved everything from compute to network to storage on to cloud.
Watch the complete session here.
“Early cloud migrations weren’t transformative enough for organisations or the industry on the whole. It did not change how business leaders worked outside of the IT industry,” he said. With the widespread prevalence of data, Natarajan notes that we need new ways to use data and democratise it. But that was only one aspect of it. There was an urgent need to migrate data in a way that was responsible and secure, while maintaining accessibility. “If data science teams need to continuously request the IT team for permission to view datasets, the process becomes inefficient,” he said. This wasn’t how it should be, according to Natarajan.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Source: Statistica, The growth in data
New trends in data transformation
Data engineering has progressed swiftly in a manner that most industry leaders are looking to move towards data warehousing so that they can begin using machine learning, with business intelligence (BI) as the final goal. For companies to perform machine learning that can contribute to the business value, the available volume of data should be large.
According to Natarajan, there were three new angles that should be considered while shifting to a new platform.
- Serverless computing: Increases productivity and makes the new platform simple. This feature also reduces infrastructure management and overhead costs.
- Portability: This ensures that as the system slowly matures, it doesn’t have to restrict itself to the platform it was built on. “The compute engines should be running closer to the data instead of the other way around,” Natarajan added.
- Democratised: Natarajan said that data shouldn’t be limited to the hands of a few people only. “Everyone who can use data to leverage their decision-making should be able to access it,” he stated.
Source: Apache Beam testing playground
Natarajan then went on to describe the benefits of Apache Beam; an “open source unified programming model to define and execute data processing pipelines, including ETL.” Google had donated the project to help simplify the mechanics of large-scale data processing while continuing to contribute to it. The popular Apache Spark, which has investors like IBM and Huawei, is where the most development is currently taking place, Natarajan noted. The Spark Runner executes Beam pipelines on top of Apache Spark, providing batch and streaming pipelines.
Source: Google Cloud, Cloud Foundation toolkit
Companies are now realising the perils of dealing with bad data. Quite simply, Natarajan says, “Bad data leads to poor insights.” There are other aspects that must be taken care of now, like data cataloguing, data security and bringing in business taxonomy. Infrastructure as a code is becoming increasingly important in the process so that larger workloads can be processed at cheaper prices. Natarajan also said that similar to Amazon or Microsoft, Google has also built cloud foundation toolkit modules that are ready-made templates. These toolkits help ease the lives of data practitioners to a great extent.
Business engagement
It is essential, Natarajan said, that business leaders are kept in the mix when shifting data platforms. “It is important to break down the silos for any meaningful change in the organisation. No CEO cares about moving to a shiny, new cloud database. They care about business outcomes that the project will bring by helping analyse data better so that better decisions are made to increase the value in customers’ lives,” he explained.
Source: Google, Features of a successful Cloud COE
In this regard, Cloud Centers of Excellence, or COEs, have become a vital part of initiating a transformational journey in organisations. “They can be a huge enabler and a conduit to power change. A well-formed COE team can have huge benefits when you want to oversee an outage and scale,” he said. Natarajan said the COE team did not consist of philosophical architects who envision a perfect future, but they are a group of passionate individuals who work consistently.
Natarajan said that he recommended that instead of focusing on a perfect result, businesses should start small and then grow. “It is important to collect evidence of your successes and strengthen your case for hybrid adoption. This also brings sceptics from the sidelines to the fore. Once the business notices an impact, they start caring about it,” he added.
Finally, Natarajan said that it was important to celebrate even the small wins to build a culture that remembers to encourage the work that has been done.