For this week’s ML practitioners series, Analytics India Magazine(AIM) got in touch with Valliappa Lakshmanan(Lak), Director for Data Analytics and AI Solutions at Google Cloud where he also founded Google’s Advanced Solutions Lab ML Immersion program. He currently leads a team that builds software solutions for business problems using Google Cloud’s data analytics and machine learning products. He is also the author of a handful of popular ML books. In this interview, Lak gives us a glimpse of how data gets leveraged from his vantage point.
AIM: Could you please tell us about your educational background?
Lak: I pursued my PhD in Electrical and Computer Engineering from the University of Oklahoma and my Master’s in Biomedical engineering from The Ohio State University in Columbus, Ohio. I did BTech in Electronics and Communications Engineering from the Indian Institute of Technology, Madras.
AIM: How did your journey in machine learning begin? Your fascination with algorithms. How did it all start?
Lak: My interest in Digital Signal Processing and Image Processing started in college when I got introduced to these subjects. My UG Project was to build a transducer to monitor heart rates. While applying for Masters I was keen to pursue my interest in this area and a lot of my coursework focused around image processing and computer vision. My MS project was on identifying mitral valves from ultrasound images.
My experience of building image processing for ultrasound helped me in getting my first job in weather forecasting. It turns out that much of the signal processing math is similar between ultrasound and ground-based weather radar. The research lab that I worked at – at University of Oklahoma and National Oceanic and Atmospheric Administration (NOAA) was a great opportunity and helped me enhance my skills and knowledge on weather detection algorithms.
As datasets grew over time, I continued my exploration of new technologies and started to apply machine learning methods like genetic algorithms and neural networks to those problems. In the 2010’s when Deep Learning was well on its way to becoming the backbone technology for all sorts of big data problems, it was a natural transition to research areas I was already involved in.
AIM: What books and other resources have you used in your journey?
Lak: One of the books I learned the most from is Neural networks for Pattern recognition by Professor Christopher Bishop. It provides a solid foundation and great intuition. I have learnt and researched on new topics as and when needed. I also learn new topics when I mentor junior colleagues. Usually, this is by reading papers and articles and trying new techniques.
AIM: What were the initial challenges and how did you address them?
Lak: In the ancient days, for all algorithms we needed, I had to implement myself in C++. The tools we had, like Weka or Stuttgart Neural Network Simulator (SNNS) wouldn’t scale to the size and complexity of the weather data. It was quite a task managing these and I wish we had solutions like TensorFlow or Keras back then!
AIM: Can you talk about the role at your current company? What does a typical day look like?
Lak: I lead the Analytics and AI solutions team at Google Cloud. My team drives client roadmaps, program building that incorporate intelligent processes into existing business process and solution planning. These solutions are best practice guides and reference implementations for common customer problems.
It’s hard to think of a typical day! So, let’s take a typical month- A lot of my work revolves around providing and receiving information. I spend a substantial amount of time working with Google Cloud customers. This entails providing strategic advice to executives, helping a data science team identify the kinds of problems they can tackle, and helping troubleshoot problems and loop in the appropriate teams at Google. I also spend a lot of time helping my teams prioritise the solutions they are building, helping them when they run into issues, and in general, doing what I need to do to make them successful and look at newer ways of growth and development for the team and myself. Besides, I am also responsible for business operations and co-ordinating with internal stakeholders.
AIM: How does your team approach a data science problem?
Lak: We spend a considerable amount of time in designing the data collection and data preparation as we do in training/evaluating/deploying models. The key tips here are to ensure that you are visualizing the data at every step of the way so that you know what’s happening to it. It’s important to look, not just at aggregate statistics, but also at individual samples.
A recent project by our team was to build an end-to-end system for doing real-time matching of documents. The solution we have published is a generalization of one that we built for a major publisher. Given a short synopsis of a book or article, they wanted to be able to retrieve similar articles that they had in their very extensive index. In the original project, we spend quite a bit of time working through getting data from across all their brands, and from public repositories and building a matching system that would work with missing data. When it came to the ML model and the different components of the system, we went with options that were both state-of-the-art as well as proven. Since many of the systems we used were serverless and fully managed, it was possible for us to move pretty fast.
AIM: What does your machine learning toolkit look like?
Lak: My preference is to develop structured data models and time-series models in BigQuery ML. For models on unstructured data, I prefer using Keras/TensorFlow and we use State-of-the-Art (SOTA) models rather than reinvent the wheel. We usually use EfficientNet for most image models, ARIMA for time series, Bert for text embeddings, etc.
For cloud services, we use Google Cloud. We use BigQuery as our data warehouse, Apache Beam on Dataflow for ETL, Cloud AI Platform Notebooks for development, Cloud AI Platform Training and Predictions for managed training and deployment, and Cloud AI Platform Pipelines (Kubeflow Pipelines) for managing the end-to-end system.
AIM: How has the ML landscape changed over the years?
Lak: Over the years, Machine learning has truly evolved and has become more powerful, and much more reusable. It is very rare that one has to develop ML models from scratch and can instead, opt for pre-designed models and train them on your dataset.
AIM:From a global perspective, where do you think India stands in the AI/ML ecosystem? What are the potential areas of Improvement?
Lak: I have been very impressed by the growth of ML applications, especially when it comes to mobile applications, text readers, medical technology, among others. There is a huge talent pool of young Indians with a strong appetite for upskilling. The degree and pace of mobile connectivity in India is very impressive.
AIM: There is a lot of hype around machine learning. So, when the dust settles down, what techniques, use cases and applications do you think will stand the test of time?
Lak: The greatest opportunity to create value using ML is in personalization and recommendation systems. The greatest opportunities are in digitizing industries, for example, to enable inventory management in physical stores that are exploring online fulfilment. These use cases will remain because customer satisfaction and cost savings will remain business priorities for the long term. The way we solve this – whether using voice assistants or scanning shelves using image classification – will change over time. The real differentiator will be in the kind of data we can collect and put to use to solve these problems.
AIM: Which domain of AI, do you think, will come out on top in the next 10 years?
Lak: AI is just a tool. It will be used to solve a wide range of problems. I don’t think it is useful to think in terms of domains of AI as being in competition with each other. The field of AI that will become more and more useful is to design AI with humans in mind – similar to the field of human factors research in user interface design.
AIM: What do outsiders get wrong about this field?
Lak: The word AI often brings us thoughts of humanoid robots who possess general intelligence – they can walk, talk, and do a wide variety of tasks. But AI today is extremely specific in what it can do.
AIM: Any additional tips/resources for the beginners?
Lak: I would recommend beginners to read books that provide a comprehensive introduction to the topic and make sure to gain practice with actually doing machine learning. I’m a little biased, but I do recommend my books:
- Data Science on the Google Cloud Platform– In this book, I have explained a typical data science project from the time you plan it, to collecting data, to deploying an ML model and consuming its predictions.
- The book BigQuery: The definitive guide will help you acquire a very important skill – the ability to easily handle large datasets and quickly gain insight from them.
- Finally, the book Machine Learning Design Patterns helps you learn what experienced people in ML already know. That is the order in which I wrote the books, and the order in which I suggest you read them as well.