Orange is an open-source, GUI based platform that is popularly used for rule mining and easy data analysis. The reason behind the popularity of this platform is it is completely code-free. Researchers, students, non-developers and business analysts use platforms like Orange to get a good understanding of the data at hand and also quickly build machine learning models to understand the relationship between the data points better.
In this article, we will be understanding:
- Why Orange?
- Installing and Setting up the tool
- Training your first machine learning model using Orange
Orange is a platform built on Python that lets you do everything required to build machine learning models without code. Orange includes a wide range of data visualisation, exploration, preprocessing and modelling techniques. Not only does it become handy in machine learning, but it is also very useful for associative rule mining of numbers, text and even network analysis.
Orange comes with multiple classifications and regression algorithms and all of these are implemented with drag and drop features. It can be used as an intuitive user interface or, for more advanced users, as a module for the Python programming language.
Installing and Setting up the tool
There are a few options you can use to install the tool. Firstly, you can click here to download the tool. You will see the following options to install the tool according to which Operating system you use.
Based on the OS, you can select the latest version of orange and the download will automatically begin. Once it is downloaded you can run the .exe file and launch the platform on your computer.
The next option is to install it using conda or pip command if Anaconda Navigator is already set up in your system.
To install the platform with conda use the command:
conda config --add channels conda-forge
conda install orange3
This will add the orange channel to your anaconda navigator and you can launch the tool from the navigator.
Since orange is built with python the last option to install it would be through the pip command.
It is better to create a virtual environment for this since a lot of dependencies need to be downloaded.
pip install orange3
Training your first machine learning model using Orange
Once the tool has been installed and setup you will be directed to the main page of the platform.
As shown above, you have some options on the left like file, CSV file, datasets etc. To keep this demonstration simple I will choose a built-in dataset. There are a number of datasets that are readily available which can be seen by double-clicking the datasets option. You can add your own CSV file by clicking the CSV option also.
After deciding which dataset you want to work with we can proceed with file creation. I have selected a simple heart disease dataset. To create the file, drag and drop the ‘File’ to the right-hand side.
Double-click on the file to select the dataset and view the dataset.
If you have uploaded the dataset using CSV option the dataset will automatically show up in the file.
Now, you need to view the dataset in the form of a table to understand the features and target better. To do this you can drag a connecting line and you will see a list of options. Select the option called ‘Data table’ and double click on this component.
After loading the data it is important to produce clean data for machine learning model implementation. To clean the data, drag a connecting line again from the file component and select impute option.
In the impute component you can clean the data by filling in missing values either by mean on random data.
I have selected the option to fill missing data(if any) with the average values. You can also remove the rows with missing values or impute the data with model-based imputer.
Once we have clean data it is time to visualize the data distribution. To do this drag a connecting line from impute component and select the scatter plot option.
Double-click the scatter plot component.
Here you are presented with options to change the values in x and y axes, change colours of the points, change opacity etc. You can perform all the visualizations here and understand your data distribution.
After visualization the data it is time to build a model.
Our dataset indicates that it is a classification type problem and requires a classification algorithm. Since the target contains values of 0 and 1 we can use a simple logistic regression model.
Note: Since heart disease data is a built-in dataset, the target is marked automatically. If you are using a custom dataset you can set the target by navigating to file component and double-clicking on the column you want to set and selecting the target option.
Drag a connecting line from the impute component and select the logistic regression option.
Click on the component. Here you can set your preferences in regression methods like a lasso or ridge regression and also adjust the C values.
Once this is done, drag a connecting line from the logistic regression component and select the option of test and score. Be sure to connect the impute component also to test and score component.
Once done, the model will begin to train and display the results.
If you have a separate test dataset you can connect it here and select the option of test on test data as well.
You can drag a connecting line from test and score component and select the option to display a confusion matrix.
Thus, we have trained a machine learning model without writing code. You can train multiple models and compare them as well. Finally, to save the model you can press CTRL+S and save the entire orange window on the local system.
In this article, we saw the advantages that Orange platform provides especially for non-coders and implemented a simple machine learning model from scratch. Orange can be used for almost any kind of analysis but most importantly, for beautiful and easy visuals.