MITB Banner

Guide To Building And Deploying ML Web Applications Using Pycaret, Streamlit and Heroku

In this article, we will introduce a library called Pycaret to build the machine learning model, a library called streamlit to efficiently build a dashboard for the project and finally, and finally, we will deploy this application to Heroku.

Share

As a beginner in the field of machine learning, many of you would have successfully built multiple models in regression, classification and clustering algorithms. But, these models will be of more use if it is accessible to others. With rapid advancements in the field of machine learning and DevOps, it is now possible to build and deploy your trained models with very minimal code. 

In this article, we will introduce a library called Pycaret to build the machine learning model, a library called streamlit to efficiently build a dashboard for the project and finally, and finally, we will deploy this application to Heroku. 

Building a Model 

We will build a simple classification model that will predict whether an employee might leave a company or not using Pycaret.

Pycaret is an open-source python library that is built for low-code model building. With the help of this library, you can pre-process, train, validate and save your model results within minutes. It is a popular choice for developing business solutions to problems because of its ease of use and efficiency. You can make use of this library either on your local machine with jupyter notebook or on the cloud with Google Colab. You can also check the different datasets that are available as built-in data in the library here. To get started, install this library with the command 

pip install pycaret
Once the installation is done, we will load the dataset as follows. 
from pycaret.datasets import get_data
dataset = get_data('employee')
pycaret

The dataset has 10 attributes and the target is labelled ‘left’ which has values 1 and 0. 1 indicates the person will leave the company and 0 indicates otherwise. 

Next, we will split the dataset as train and test data. 

data_seen = dataset.sample(frac=0.95, random_state=780).reset_index(drop=True)
data_unseen = dataset.drop(data_seen.index).reset_index(drop=True)
dataset=dataset.drop(['department','average_montly_hours'],axis=1)
print('Data for Modeling: ' + str(data_seen.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
from pycaret.classification import *
setting_up = setup(data = data_seen, target = 'left', session_id=123)

Upon running this, we get a tabular column of all the properties of our dataset such as presence of missing values, whether PCA or other transformations are required, outliers etc. Our data doesn’t have missing values nor requires transformation, hence we will proceed with training. 

pycaret

For the training process, pycaret allows us to compare the accuracies of all classification models that are fit to our dataset and pick the most appropriate model. 

compare_models()

As seen here, when compared to all classification algorithms, the random forest seems to work best. Hence we will pick that model for our dataset and train the seen and unseen data. 

rf = create_model('rf')

Once the model is created, hyperparameter tuning is automatically done with the command 

tuned_model = tune_model(rf)

Now, we will finalize our training process and predict the results of unseen data. 

final = finalize_model(tuned_model)
unseen_predictions = predict_model(final, data=data_unseen)
unseen_predictions.head()

Finally, we will save our model to create a pipeline and a pickle file with all the training weights for this model.

save_model(final,'Final_model')

After you run this you will notice that a pickle file by name ‘Final_model’ has been created in your working directory. With these simple steps and around 15 lines of code, we have successfully built our classification model. 

Building a Dashboard 

Python provides us with another robust and low code library called streamlit. Streamlit is a minimal framework that helps deploy powerful apps. It includes simple code to create designs for the dashboard and is compatible with multiple environments. For creating a dashboard in our application, all we need to do is load our pickle file and create a field to enter inputs. Create a new python file for the implementation of the dashboard.

Let us load our weights file first. 

from pycaret.classification import load_model, predict_model
import streamlit as st
import pandas as pd
import numpy as np
model = load_model('Final_model')

Now, let us write a function to predict the output for different inputs we get through the web interface. 

def predict(model, input_df):
    predictions_df = predict_model(estimator=model, data=input_df)
    predictions = predictions_df['Label'][0]
    return predictions

The inputs can either be online where users enter the values in fields provided and get the predictions, or it can be through a CSV file. We will implement both these features. Keep in mind when you are creating fields, it has to be in the same order as the training data. 

def run():
    from PIL import Image
    image = Image.open('Employee.png')
    image_hospital = Image.open('office.jpg')
    st.image(image,use_column_width=False)
    add_selectbox = st.sidebar.selectbox(
    "How would you like to predict?",
    ("Online", "Batch"))
    st.sidebar.info('This app is created to predict if an employee will leave the company')
    st.sidebar.success('https://www.pycaret.org')
    st.sidebar.image(image_hospital)
    st.title("Predicting employee leaving")
    if add_selectbox == 'Online':
        satisfaction_level=st.number_input('satisfaction_level' , min_value=0.1, max_value=1.0, value=0.1)
     last_evaluation =st.number_input('last_evaluation',min_value=0.1, max_value=1.0, value=0.1)
        number_project = st.number_input('number_project', min_value=0, max_value=50, value=5)
        time_spend_company = st.number_input('time_spend_company', min_value=1, max_value=10, value=3)
        Work_accident = st.number_input('Work_accident',  min_value=0, max_value=50, value=0)
        promotion_last_5years = st.number_input('promotion_last_5years',  min_value=0, max_value=50, value=0)
        salary = st.selectbox('salary', ['low', 'high','medium'])
       output=""
input_dict={'satisfaction_level':satisfaction_level,'last_evaluation':last_evaluation,'number_project':number_project,'time_spend_company':time_spend_company,'Work_accident': Work_accident,'promotion_last_5years':promotion_last_5years,'salary' : salary}
        input_df = pd.DataFrame([input_dict])
        if st.button("Predict"):
            output = predict(model=model, input_df=input_df)
            output = str(output)
        st.success('The output is {}'.format(output))

Let us now create an option for uploading CSV files as input 

    if add_selectbox == 'Batch':
        file_upload = st.file_uploader("Upload csv file for predictions", type=["csv"])
        if file_upload is not None:
            data = pd.read_csv(file_upload)
            predictions = predict_model(estimator=model,data=data)
            st.write(predictions)

Once you have implemented this, it is time to run the application. The command to do this is 

streamlit run “applicationname.py”

This will take us directly to the dashboard.

streamlit

You can enter different values in the fields provided and click on the predict button for predictions to appear. Here I have entered a different set of values and the predictions are as shown below. 

These values seem to indicate that the person might leave the company. Changing the values yield other results.

We have successfully deployed our model to our localhost. Now, with the help of Heroku, we can deploy the model and have other people access it and use it. 

To do this, create a Github repository and upload all of the files that we just created. Along with these files, you need to create 2 other files. They are

  1. requirements.txt: this file needs to include all the required files and installations that Heroku needs to understand. To get this use the command pip freeze requirements.txt in your command prompt under the working directory. 
  2. Procfile: This file tells Heroku what command it has to run in order to display something on the screen. In our case, we need the streamlit python file to appear on the screen. Hence the procfile contains web: streamlit run “applicationame.py”

Once these files are ready in the directory, you can upload them to GitHub. Once done, go to Heroku and set up an account. Create a name for your app. I have used the name leavepredictions. After you have created the app, you can scroll down and select the option of linking your Github account.

heroku deploy

Enter the name of the repository that contains your files and select the option of deploying the branch. Heroku automatically installs all the requirements from requirements.txt and runs the command in procfile. When the process is complete, the deployment is done! Here is the link for the project given above. Similar to this, you can create your models in PyTorch, Keras or TensorFlow and deploy them using a similar methodology. 

Conclusion 

The ability to build and deploy a machine learning model in around 30 lines of code is remarkable and shows how fast the world of AI is developing. This article shows a simple and efficient way of using different python libraries to build and deploy and model. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.