MITB Banner

KNIME – A Primer To Automate Machine Learning ‘No Code’ Workflows

In this article, we will learn about KNIME and discuss how to use this tool for building a machine learning model from scratch.

Share

Beginners in the field of machine learning face numerous challenges in trying to cope up with the fast-paced nature of AI. It is especially difficult for people with no coding experience since they have to learn the math behind the algorithm and learn how to code the algorithm as well. To make things a little easier for them, a no-code machine learning GUI called KNIME was developed. 

In this article, we will learn about KNIME and discuss how to use this tool for building a machine learning model from scratch. 

What is KNIME?

Knime is a GUI based workflow platform that can be used to effectively build machine learning models without having to code. Here, you simply have to define the workflow between some pre-defined nodes. These nodes may be for data cleaning, data visualization and model training. Once the workflow is defined, the model can be trained to get the desired output. All functions from basic input-output operations to data mining can be performed with KNIME. 

Installing KNIME

To download this interface click here and select the operating system as per your computer requirements. 

KNIME

For Windows users select the first option above and the download will begin. Once the download is completed follow the steps shown and you will then see a KNIME dashboard before you. 

Creating a workflow

To create the machine learning model we first need to set up a workflow. For this, select File-> New and select a new workflow. 

You will get a popup where you can type in the name of the project. 

workflow

Click on the finish to get the new workflow before you. 

On the right-hand side, you can type in the description of the projects, any links for reference as well. The left-hand side is where you will be creating the workflow. 

Getting the dataset

Now that we have created our workspace, let us get the dataset. To do this, first, download the dataset that you want to use for the project. I have used the tips dataset from Kaggle, which can be downloaded from here. The dataset contains values like a smoker, time, day and total_bill which is used to predict how many tips a waiter will get. It is a regression problem and is a simple project. After downloading the data, go to your node repository and search ‘file reader’. Drag and drop this on the workspace. 

GUI

Then, double click this and browse the dataset on your local system and upload the file. Once you do that you will get a preview of the dataset. 

Here you can select options like ignore tab spaces, reading the column headings etc. After you have selected the desired options select apply and ok. Once one, right-click on the node and select ‘Execute’ button so that it is executed. 

Correlation

The next step is to identify the correlation that exists between the features. To do this search ‘Linear correlation’ on the node repository and drag and drop it to the workspace. Then, connect your dataset to this node. 

Now, right-click on this and click on ‘execute’. After executing this, right-click again and click on ‘view correlation matrix’. Once you select this you will see the matrix. 

Some columns are not related much with the others hence it is clear that tip and total_bill have a very high correlation. Let us select these two columns to build the model. 

Data visualization

The next step in model building is to visualize the dataset. To do this, search the type of plot you want to visualize. I have selected the scatter plot for the visualization. Drag and drop this node on the workspace and connect your file reader node with it. Once done, right-click and select execute. 

visualiztion

Here you can change the columns as well to find out how the data is scattered. There are other visualization methods as well like pie charts as shown below

Data manipulation

In order to find out which values are missing, type in the node repository ‘missing values’. Drag and drop this node and connect with the input file reader. 

Next, double click on the missing values node. Here you will find a dashboard that lets you impute values in the dataset. 

These options allow you to impute values either as number or string. I have select to impute the missing values with the mean value. But you can choose from the below options according to your requirements. 

GUI

After selecting this, you can select apply and ok and the missing values are filled automatically. Finally, right-click and select the execute option to run the node. 

Model building

After we have pre-processed and visualized the data it is time to build a model. I will make use of the simple linear regression model on this dataset. To do this, type linear regression learner in the node repository and drag and drop this on the workspace. Connect the missing values node to this since it has the pre-processed data. 

machine learning

Now double click the linear regression node. The following is displayed.

KNIME

Here on the top you need to set the target column. Once you set this the target is automatically removed from the inputs shown below. You can choose to eliminate some of the features as well. I will eliminate a few features since there was not much correlation between them with the target. Just select the column to be excluded and click the left arrow button to do this.

KNIME

Once done, select the apply button. Next, right-click the node and select execute. Once the execution is done you can see the output on the screen. 

machine learning

The different types of errors and the R-squared value is shown and the results are quite good here. 

Thus we have built a machine learning model without coding. 

Conclusion

In this article, we saw how simple it is to use the KNIME GUI and build a machine learning model. There is a lot left to explore in this tool for building better and more complex models. KNIME also supports building neural networks and clustering algorithms which is making machine learning easy and accessible to everyone. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.