Last updated October 7, 2021
In AI Mysteries

Guide To Building And Deploying ML Web Applications Using Pycaret, Streamlit and Heroku

In this article, we will introduce a library called Pycaret to build the machine learning model, a library called streamlit to efficiently build a dashboard for the project and finally, and finally, we will deploy this application to Heroku.

Share

Published on August 20, 2020

by Bhoomika Madhukar

As a beginner in the field of machine learning, many of you would have successfully built multiple models in regression, classification and clustering algorithms. But, these models will be of more use if it is accessible to others. With rapid advancements in the field of machine learning and DevOps, it is now possible to build and deploy your trained models with very minimal code.

Building a Model

We will build a simple classification model that will predict whether an employee might leave a company or not using Pycaret.

Pycaret is an open-source python library that is built for low-code model building. With the help of this library, you can pre-process, train, validate and save your model results within minutes. It is a popular choice for developing business solutions to problems because of its ease of use and efficiency. You can make use of this library either on your local machine with jupyter notebook or on the cloud with Google Colab. You can also check the different datasets that are available as built-in data in the library here. To get started, install this library with the command

pip install pycaret
Once the installation is done, we will load the dataset as follows. 
from pycaret.datasets import get_data
dataset = get_data('employee')

The dataset has 10 attributes and the target is labelled ‘left’ which has values 1 and 0. 1 indicates the person will leave the company and 0 indicates otherwise.

Next, we will split the dataset as train and test data.

data_seen = dataset.sample(frac=0.95, random_state=780).reset_index(drop=True)
data_unseen = dataset.drop(data_seen.index).reset_index(drop=True)
dataset=dataset.drop(['department','average_montly_hours'],axis=1)
print('Data for Modeling: ' + str(data_seen.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

from pycaret.classification import *
setting_up = setup(data = data_seen, target = 'left', session_id=123)

Upon running this, we get a tabular column of all the properties of our dataset such as presence of missing values, whether PCA or other transformations are required, outliers etc. Our data doesn’t have missing values nor requires transformation, hence we will proceed with training.

For the training process, pycaret allows us to compare the accuracies of all classification models that are fit to our dataset and pick the most appropriate model.

compare_models()

As seen here, when compared to all classification algorithms, the random forest seems to work best. Hence we will pick that model for our dataset and train the seen and unseen data.

rf = create_model('rf')

Once the model is created, hyperparameter tuning is automatically done with the command

tuned_model = tune_model(rf)

Now, we will finalize our training process and predict the results of unseen data.

final = finalize_model(tuned_model)
unseen_predictions = predict_model(final, data=data_unseen)
unseen_predictions.head()

Finally, we will save our model to create a pipeline and a pickle file with all the training weights for this model.

save_model(final,'Final_model')

After you run this you will notice that a pickle file by name ‘Final_model’ has been created in your working directory. With these simple steps and around 15 lines of code, we have successfully built our classification model.

Building a Dashboard

Python provides us with another robust and low code library called streamlit. Streamlit is a minimal framework that helps deploy powerful apps. It includes simple code to create designs for the dashboard and is compatible with multiple environments. For creating a dashboard in our application, all we need to do is load our pickle file and create a field to enter inputs. Create a new python file for the implementation of the dashboard.

Let us load our weights file first.

from pycaret.classification import load_model, predict_model
import streamlit as st
import pandas as pd
import numpy as np
model = load_model('Final_model')

Now, let us write a function to predict the output for different inputs we get through the web interface.

def predict(model, input_df):
    predictions_df = predict_model(estimator=model, data=input_df)
    predictions = predictions_df['Label'][0]
    return predictions

The inputs can either be online where users enter the values in fields provided and get the predictions, or it can be through a CSV file. We will implement both these features. Keep in mind when you are creating fields, it has to be in the same order as the training data.

def run():
    from PIL import Image
    image = Image.open('Employee.png')
    image_hospital = Image.open('office.jpg')
    st.image(image,use_column_width=False)
    add_selectbox = st.sidebar.selectbox(
    "How would you like to predict?",
    ("Online", "Batch"))
    st.sidebar.info('This app is created to predict if an employee will leave the company')
    st.sidebar.success('https://www.pycaret.org')
    st.sidebar.image(image_hospital)
    st.title("Predicting employee leaving")
    if add_selectbox == 'Online':
        satisfaction_level=st.number_input('satisfaction_level' , min_value=0.1, max_value=1.0, value=0.1)
     last_evaluation =st.number_input('last_evaluation',min_value=0.1, max_value=1.0, value=0.1)
        number_project = st.number_input('number_project', min_value=0, max_value=50, value=5)
        time_spend_company = st.number_input('time_spend_company', min_value=1, max_value=10, value=3)
        Work_accident = st.number_input('Work_accident',  min_value=0, max_value=50, value=0)
        promotion_last_5years = st.number_input('promotion_last_5years',  min_value=0, max_value=50, value=0)
        salary = st.selectbox('salary', ['low', 'high','medium'])
       output=""
input_dict={'satisfaction_level':satisfaction_level,'last_evaluation':last_evaluation,'number_project':number_project,'time_spend_company':time_spend_company,'Work_accident': Work_accident,'promotion_last_5years':promotion_last_5years,'salary' : salary}
        input_df = pd.DataFrame([input_dict])
        if st.button("Predict"):
            output = predict(model=model, input_df=input_df)
            output = str(output)
        st.success('The output is {}'.format(output))

Let us now create an option for uploading CSV files as input

    if add_selectbox == 'Batch':
        file_upload = st.file_uploader("Upload csv file for predictions", type=["csv"])
        if file_upload is not None:
            data = pd.read_csv(file_upload)
            predictions = predict_model(estimator=model,data=data)
            st.write(predictions)

Once you have implemented this, it is time to run the application. The command to do this is

streamlit run “applicationname.py”

This will take us directly to the dashboard.

You can enter different values in the fields provided and click on the predict button for predictions to appear. Here I have entered a different set of values and the predictions are as shown below.

These values seem to indicate that the person might leave the company. Changing the values yield other results.

We have successfully deployed our model to our localhost. Now, with the help of Heroku, we can deploy the model and have other people access it and use it.

To do this, create a Github repository and upload all of the files that we just created. Along with these files, you need to create 2 other files. They are

requirements.txt: this file needs to include all the required files and installations that Heroku needs to understand. To get this use the command pip freeze requirements.txt in your command prompt under the working directory.
Procfile: This file tells Heroku what command it has to run in order to display something on the screen. In our case, we need the streamlit python file to appear on the screen. Hence the procfile contains web: streamlit run “applicationame.py”

Once these files are ready in the directory, you can upload them to GitHub. Once done, go to Heroku and set up an account. Create a name for your app. I have used the name leavepredictions. After you have created the app, you can scroll down and select the option of linking your Github account.

Enter the name of the repository that contains your files and select the option of deploying the branch. Heroku automatically installs all the requirements from requirements.txt and runs the command in procfile. When the process is complete, the deployment is done! Here is the link for the project given above. Similar to this, you can create your models in PyTorch, Keras or TensorFlow and deploy them using a similar methodology.

Conclusion

The ability to build and deploy a machine learning model in around 30 lines of code is remarkable and shows how fast the world of AI is developing. This article shows a simple and efficient way of using different python libraries to build and deploy and model.

Access all our open Survey & Awards Nomination forms in one place

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.