MITB Banner

Build an end-to-end machine learning model with DataRobot

DataRobot, powered with the recent open-source algorithms and loaded with on-premise AI services, offers features to build and deploy ML models with ease.
Share
Listen to this story

DataRobot is an advanced business AI platform that democratises data science and automates the end-to-end process of designing, implementing, and sustaining machine learning and artificial intelligence at scale. DataRobot, which is powered by the most recent open-source algorithms and is accessible in the cloud, on-premise, or as a fully managed AI service, provides you with the power of AI to achieve superior business outcomes. In this article, we would learn to build an end-to-end ML classifier model with DataRobot. Following are the topics to be covered.

Table of contents

  1. Data upload
  2. Selecting the model
  3. Computing the results
  4. Deploy the final model

This article will be using data related to direct market campaigning of a banking institution. Phone calls were used in the marketing activities. More than one contact with the same consumer was frequently necessary to determine if the product (bank term deposit) would be subscribed or not. We will be building a classification model that would classify the customer whether to call the customer or not. 

Let’s start with uploading the data since the model can’t learn without data.

Data upload

Once registered and logged in to the DataRobot web page, there would be a page asking to select from options like data visualization, AI model building, and deployment. After selecting the page would redirect to something like this.

There are certain conditions for the data which are listed below.

  • The correct file format
  • Size less than 200 MB
  • Minimum 20 rows
  • No more than one missing column header
  • No duplicate column headers
  • No unsupported or inconsistent encoding

If your data file is greater than 200 MB then you need to create a job id and then it could be used since DataRobot has restricted the direct upload to 200MB.

Then click on “Data” present at the top taskbar on the web page to access the data. Once the data is uploaded the target column needs to be selected. If the target column is discrete DataRobot would generate a count plot for the categories.

Selecting the model

Once selected the target column moves to the modelling mode. In this, there are a variety of modes which are listed as quick, autopilot, manual and comprehensive. 

  • The quick mode is a kick-starter mode which will generate base models and could be improvised accordingly.
  • The autopilot mode will build all the possible models offered by DataRobot with different cross-validation, train-test sample sizes, and feature selections.
  • The manual mode is a user-defined mode which means you can select the model on your own and can train accordingly.
  • The comprehensive mode is one step further than the autopilot mode. If you are not satisfied with the autopilot’s model recommendation then you can use comprehensive mode.

In this article, we are going to use the “autopilot mode”.

Once selected just click on the start button and the page would be redirected to something like this.

Here the data is been analyzed and you can select the number of features to be used for the training purpose. Once done with this part you can move to selecting models or depending on the mode selected earlier in the Modeling section by clicking on the “Models”.

Since we are using the autopilot mode the process of training the model will automatically initialise after the data analyses part is completed. So, we just need to sit back and wait for the process to be completed.

Computing the results

The process was initialized with 31 models. These 31 models are different versions of the base model. The base model where tree-based classifiers and linear classifiers. 

At the last, there were a total of 63 models with different sample sizes, combining different tree-based algorithms and linear algorithms, different hyperparameter tuning, etc.

After the completion, the DataRobot’s autopilot mode recommended that the 

“Light Gradient Boosted Trees Classifier with Early Stopping” is the best model to be deployed.

Let’s see the performance of the final model. By clicking on the model’s name we could see different parameters which can evaluate the performance. These parameters could help to understand the reason behind the recommendation.

Here we can see in the right panel there is a confusion matrix and below that, we have the sensitivity and precision. So, the precision is 0.52 and the sensitivity is 0.70. For this article, we are considering the rate of positively predicted values since those customers could be targeted by the sales team. On the left panel, we could see the ROC curve and AUC score of 0.92. This model is a good performing model. 

Let’s analyze further by observing the processing time because when deploying a model one thing is important how fast could the model process the user inputs. A faster model with slightly low performance in prediction is better than a slower model with high performance.

If you click on the “Speed vs Accuracy” tab, a scatter plot would represent the performance with time.

The final model is the best because it is the fastest with just 67.1 milliseconds to process the data. 

Now we are satisfied that the recommendation is good. Let’s deploy the model.

Deploy the final model

Deploying the model is easy just select the model from the Models tab by just clicking the checkbox beside the name of the model. Then click on the “deploy” button and the model is deployed.

Once the model is deployed you can view the model by clicking on the “ML Ops” tab on top of the taskbar. 

Conclusions

DataRobot can create predictions one at a time or in huge batches by importing a file.

Any machine learning model can be converted into an AI application using DataRobot, allowing anyone in your ecosystem to interact with the predictive insight of the underlying model. This critical application allows you to compare a forecast to historical outcomes, examine prediction reasons, and change input parameters to observe how they affect these results. With this article, we have learned to build and deploy a predictive model with DataRobot.

References

PS: The story was written using a keyboard.
Picture of Sourabh Mehta

Sourabh Mehta

Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed