KNIME – A Primer To Automate Machine Learning ‘No Code’ Workflows

In this article, we will learn about KNIME and discuss how to use this tool for building a machine learning model from scratch.

Beginners in the field of machine learning face numerous challenges in trying to cope up with the fast-paced nature of AI. It is especially difficult for people with no coding experience since they have to learn the math behind the algorithm and learn how to code the algorithm as well. To make things a little easier for them, a no-code machine learning GUI called KNIME was developed. 

In this article, we will learn about KNIME and discuss how to use this tool for building a machine learning model from scratch. 

What is KNIME?

Knime is a GUI based workflow platform that can be used to effectively build machine learning models without having to code. Here, you simply have to define the workflow between some pre-defined nodes. These nodes may be for data cleaning, data visualization and model training. Once the workflow is defined, the model can be trained to get the desired output. All functions from basic input-output operations to data mining can be performed with KNIME. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Installing KNIME

To download this interface click here and select the operating system as per your computer requirements. 

KNIME

For Windows users select the first option above and the download will begin. Once the download is completed follow the steps shown and you will then see a KNIME dashboard before you. 


Download our Mobile App



Creating a workflow

To create the machine learning model we first need to set up a workflow. For this, select File-> New and select a new workflow. 

You will get a popup where you can type in the name of the project. 

workflow

Click on the finish to get the new workflow before you. 

On the right-hand side, you can type in the description of the projects, any links for reference as well. The left-hand side is where you will be creating the workflow. 

Getting the dataset

Now that we have created our workspace, let us get the dataset. To do this, first, download the dataset that you want to use for the project. I have used the tips dataset from Kaggle, which can be downloaded from here. The dataset contains values like a smoker, time, day and total_bill which is used to predict how many tips a waiter will get. It is a regression problem and is a simple project. After downloading the data, go to your node repository and search ‘file reader’. Drag and drop this on the workspace. 

GUI

Then, double click this and browse the dataset on your local system and upload the file. Once you do that you will get a preview of the dataset. 

Here you can select options like ignore tab spaces, reading the column headings etc. After you have selected the desired options select apply and ok. Once one, right-click on the node and select ‘Execute’ button so that it is executed. 

Correlation

The next step is to identify the correlation that exists between the features. To do this search ‘Linear correlation’ on the node repository and drag and drop it to the workspace. Then, connect your dataset to this node. 

Now, right-click on this and click on ‘execute’. After executing this, right-click again and click on ‘view correlation matrix’. Once you select this you will see the matrix. 

Some columns are not related much with the others hence it is clear that tip and total_bill have a very high correlation. Let us select these two columns to build the model. 

Data visualization

The next step in model building is to visualize the dataset. To do this, search the type of plot you want to visualize. I have selected the scatter plot for the visualization. Drag and drop this node on the workspace and connect your file reader node with it. Once done, right-click and select execute. 

visualiztion

Here you can change the columns as well to find out how the data is scattered. There are other visualization methods as well like pie charts as shown below

Data manipulation

In order to find out which values are missing, type in the node repository ‘missing values’. Drag and drop this node and connect with the input file reader. 

Next, double click on the missing values node. Here you will find a dashboard that lets you impute values in the dataset. 

These options allow you to impute values either as number or string. I have select to impute the missing values with the mean value. But you can choose from the below options according to your requirements. 

GUI

After selecting this, you can select apply and ok and the missing values are filled automatically. Finally, right-click and select the execute option to run the node. 

Model building

After we have pre-processed and visualized the data it is time to build a model. I will make use of the simple linear regression model on this dataset. To do this, type linear regression learner in the node repository and drag and drop this on the workspace. Connect the missing values node to this since it has the pre-processed data. 

machine learning

Now double click the linear regression node. The following is displayed.

KNIME

Here on the top you need to set the target column. Once you set this the target is automatically removed from the inputs shown below. You can choose to eliminate some of the features as well. I will eliminate a few features since there was not much correlation between them with the target. Just select the column to be excluded and click the left arrow button to do this.

KNIME

Once done, select the apply button. Next, right-click the node and select execute. Once the execution is done you can see the output on the screen. 

machine learning

The different types of errors and the R-squared value is shown and the results are quite good here. 

Thus we have built a machine learning model without coding. 

Conclusion

In this article, we saw how simple it is to use the KNIME GUI and build a machine learning model. There is a lot left to explore in this tool for building better and more complex models. KNIME also supports building neural networks and clustering algorithms which is making machine learning easy and accessible to everyone. 

More Great AIM Stories

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.