No-code environments in machine learning have become increasingly popular due to the fact that almost anybody who needs machine learning, whatever field they may be in, can use these tools to build models for themselves. WEKA is one of the early no-code tools that was developed but is very efficient and powerful. WEKA can be used to implement state of the art machine learning and deep learning models and can support numerous file formats.
In this article, we will learn about how to use WEKA to pre-process and build a machine learning model with code.
Installing and setting up WEKA
WEKA can be used in Linux, Windows or Mac operating systems and you can download this from the official website here. Select the operating system and click on download. The download will begin automatically. Once this is done, follow the steps to complete the installation and WEKA is ready to be used.
When you launch the application on the local system you are presented with a dashboard.
Explorer: This environment is WEKA’s graphical user interface. You can find datasets and many machine learning models here along with visualization and pre-processing tools.
Experimenter: This environment is used for conducting experiments on the data or for performing certain statistical operations on the learning dataset.
KnowledgeFlow: This environment provides the same functionality as that of the experimenter but is a drag and drop interface.
Workbench: Workbench is an all in one application that combines user-selectable perspectives in it.
Simple CLI: This is used for a deeper week and uses lesser memory compared to the other environments. It is prefered for larger deep learning models.
For this article, we will make use of the explorer environment to build a machine learning model.
Selecting this environment gives a dashboard that looks like this.
You can use a dataset of your own and the tool can understand the dataset. But, here I have selected one of the built-in datasets. The build-in datasets in the tool are in the format of .arff. Weka supports CSV, JSON, Excel, bsi etc.
To select the dataset from Weka, click on the ‘Choose’ option and navigate to the folder where you have installed weka. Select a folder named data here and you can see the following datasets.
I have selected the dataset called vote.arff. This dataset contains information about voters who either vote for democrats or republicans based on a lot of factors.
Once you select the dataset you can see the different features of the dataset.
After the dataset is loaded we see that there are some missing values in the dataset. To eliminate this we can select the ‘choose’ option on the top left again and we see a list of filters that can be used for data manipulation.
Under this, select the unsupervised-> attributes-> replace missing values option.
Once this is selected, the missing values are automatically replaced and the dataset is free from any missing row or column.
To visualize all features against the label you can select the ‘visualize all’ button on the screen and you can see bar charts of all feature distributions against the target. Blue represents the democratic party and red represents the republican.
But, for other forms of visualization, you can select the ‘visualize’ tab on the top.
You can click on each box and identify the distributions according to each instance of the data.
As shown above, the x-axis contains immigration, y contains the classes and to the right, you can see each instance of the dataset. You can explore different types of visualization here.
The final step would be to build a classification model that can predict which party would get more votes. To build a classification model select the ‘classify’ tab on the top in the explorer dashboard.
As you see there are multiple algorithms that are available here. I have decided to use the decision tree classification model. Select the decision tree option from here and you can see the results immediately.
The output shows how many instances are accurate, how many are wrong and the different errors which were calculated. Finally it shows the confusion matrix in the end as well.
You can also train multiple models here by making other alterations like dropping a few columns, applying PCA etc. All these results can be logged and saved in the workspace that is created.
Thus, we create a classification model just by using a few clicks of the mouse without writing any code.
We saw in this article how we can build a machine learning model with WEKA and understood the different environments present in this tool. Weka finds wide applications among researchers and business analysts for faster model building and data analysis.