How I created An AutoML Library: Ram Seshadri From Google

“And it does it in a few minutes, versus the hours and days other libraries take on a real-world dataset.”

Ram Seshadri, program manager at Google ML, spoke about his deep learning library, DeepAutoViML, at the DevCon 2021. An expert in Python and SQL, Seshadri’s quantitative background is rich with a bachelors in mechanical engineering and an MBA in finance & economics. 

“Build deep learning models for NLP, structured data and images with a single line of code,” described Seshadri. In the talk, he discussed the Deep_AutoViML, its features and a demonstration across different examples. 

Deep_AutoViML is an extension of Seshadri’s past open-source libraries; AutoViz, Auto-TS and Auto_ViML. AutoViz is his library to help visualise data sets automatically with a single line of code; Auto_ViML is to automate building pipelines in a single line of code, used especially for hackathons & exploratory model building and Auto-TS for time series models in a single line of code. 

Seshadri’s ‘secret sauce’ behind the three libraries is Featurewiz that allows “You to build a better model by adding additional features or removing unwanted features from your dataset”, he explained. 

“This is a good way to build a machine learning model without much effort”, Seshadri claimed. “I never recommend that people go and build a machine learning model on the very first day they get a dataset. They should spend some time understanding the data.”

Built using Tensorflow, Deep_AutoViML is an extension of these to build tf.Keras models and pipelines in a single line of code. 

The Working Process

Deep_AutoViML allows users to have control over the level of automation while building the model. For example, users can use a single line of code and let the software do the work or write some lines of code to build a custom model. 

Essentially, the library’s main design goal is to reduce the time to production from experimentation. Therefore, AutoViML’s workflow focuses on data acquisition, model building, and predictions bypassing several steps in the ML workflow. 

On inputting the files, the library will create a data pipeline using Tensorflow and convert the data into numeric features in the preprocessing step. 

Source: DevCon 2021

The framework uses two tuners, storm and Optuna, to find the best model architecture for the data. For instance, generally, the neutral network will consist of three to four deep layers in a structured dataset. Still, here, the tuner considers the multiple parameters to finetune and find the best model architecture. “This does not guarantee the best performing model, but it will be pretty close. And it does it in a few minutes, versus the hours and days other libraries take on a real-world dataset,” according to Seshadri. After finding the suitable model architecture, it trains the model on the user’s data and trains it to make predictions based on the data. The user can host the model anywhere, including the cloud or local machine.

What sets Deep_AutoViML apart

Next, Ram talked about the benefits of this library and what sets it apart from other deep learning libraries in the market. 

  • Deep_AutoViML allows users to experiment with multiple deep learning architectures. 
  • Unlike most popular libraries, Deep_AutoViML allows users to use the same syntax and change single options to have the model be on structured data, NLP or image. This works best for beginners who are yet to learn complicated syntax or multiple libraries. 
  • Deep_AutoViML can handle very large datasets, given Tensorflow’s feature that it never loads files completely. Batches are loaded only at the time needed, preventing them from running out of memory.
  • Deep_AutoViML is a pretty good library to create a performant model with less time to train. 

Source: DevCon 2021

The main difference between building a model using other libraries and Deep_AutoViML is automation. The preprocessing layers in Deep_AutoViML are tied to the model’s middle layers – meaning, the model can handle raw data. This allows the user to take a model from Deep_AutoViML and put it in the MLOps layer without extra preprocessing.

Multiple Model Service

Deep_AutoViML allows the users to build several different model types. The main way to control the type of model is keras_model_type. The multiple models that can be built are: 

  • Tabular Data: Auto – tabular datasets, automatic hyperparameter tuning
  • Tabular Data: Fast – suggested for very large dataset and less time consuming
  • Tabular Data: Fast 1 – provides better results than Fast but is slower
  • Tabular Data: Fast 2 – suggested for deep and cross model architecture
  • NLP: *NLP* – building a simple NLP model
  • NLP: *text*- for a more advanced model
  • BERT: *BERT* – for a pre-trained model using BERT
  • Image: *image*- for image classification models. This uses Mobilenet since it can run on heavyweight apps as well as lightweight mobile phone cameras. 

The Neural Network

Seshadri discussed the problem with Kaggle, pandas, or NumPy that don’t easily support large datasets. For data in terabytes and petabytes, he suggests using Deep_AutoViML or something like TensorFlow/Python

For a titanic dataset, the pipeline uses a deep and wide neural network. “I can tell you that it will take you a while to build a model architecture like this in five hours or less, whereas it takes Deep_AutoViML one minute or less to build a model like this for a dataset – that’s how powerful it is,” explained Seshadri. 

Automation: Types of models 

  • Automatic feature transformation
  • Automatic feature crosses
  • Automatic data type
  • Automatic feature renaming
  • Automatic missing value filling feature
  • Automatic label encoding
  • Automatic multi-label predictions 

Lastly, Seshadri demonstrated the Deep_AutoViML library on a titanic dataset. He has also provided the demonstration resources for further different models; an NLP and COVID-19 X-rays. The resources can be found below:

⚡⏰Titanic_keras_0.7926_score_81ROC | Kaggle

⚡⏰Disaster_NLP_Tweets_with_Swivel_84%Accuracy | Kaggle

⚡⏰ Covid-19 Image Classification 
Find the GitHub repository here.

More Great AIM Stories

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM