Achieving enlightenment through the process of learning highly frolics between what it is to remember and to forget. Without memories, we would have no evidence of what we knew, know, or what we might come to know. Rote learning, also known as instance-based learning, is one such process of memorizing specific new occurrences as they are encountered in real-time. Instance-Based Learning algorithms have widely gained their popularity within the machine learning & deep learning space and are now being implemented to create artificially intelligent systems. The fundamental idea is really simple and easy. It is to realize within a computer program that each time a new and useful piece of information is encountered, it gets stored away for future use. Through this technique, the accuracy of all the calculated instances improves over time. But what if I told you that you could create one without even writing a single line of code? Weka Comes To The Rescue.
What is Weka?
Weka is an open-source software & data science tool developed by the University of Waikato that provides a framework for several real-world implementations such as data preprocessing, Machine Learning algorithms, and visualization to develop Machine Learning techniques and apply them to data mining problems. With no code to be written and results generated just a few clicks away, Weka has its popularity brewing.
What’s more in store? Various models can be applied to the same dataset simultaneously to compare and select the one that best fits!. This results in quicker development of models on the whole. Depending on the kind of ML model you are trying to develop, you can select one option such as Classify, Cluster, or Associate. The Attributes Selection feature allows the automatic selection of features to create a reduced dataset. It also provides you with the statistical output of the model processed.
About The Algorithm To Be Used
K Nearest Neighbour is a Supervised Machine Learning Algorithm that can be used for various Instance-Based Classification problems. Instances themselves represent knowledge; can the classification be performed on a two-dimensional instance space. KNN captures the idea of similarity, also sometimes known as distance, proximity, or closeness, by calculating the distance between points on a graph. KNN algorithm stores all the available data and classifies a new data point based on its similarity. So that when new data appears, it can be easily classified into a well-suited category by using the KNN algorithm. We’ll be implementing it with the straight-line distance metric to help us with classification, also called the Euclidean distance, which is a popular and familiar choice. In KNN, the ‘K’ parameter refers to the number of nearest neighbours to include in the majority of the voting process for classification. For Example, Let’s assume K = 5 and a new data point is to be classified as one of the two colours, red or green, by the majority of votes from its five neighbours, it would be classified as red since four out of five neighbours are red or vice versa depending on the majority.
About The Dataset
We will be using the weather.nominal.arff dataset which is a comparatively small dataset with attributes describing weather conditions, and whether it is desirable to play outdoors or not. We will try to build a decision tree for this data and create a simple instance-based learning model that will help us classify and improve our model accuracy.
You can download the dataset from here.
Creating our Model
To start things, we will first load our dataset into the Weka Tool.
Go to Explorer -> Preprocess -> Open File -> Choose our dataset -> Click Open.
We’ll get a result screen such as this after the successful loading of the dataset.
From the bottom left, choose the attributes to be worked upon.
From the top right, select classify.
From the Classifier module selector, Select your classifier. We’ll be using J48 which is Weka’s classification package for building decision trees. If not present, you can search and download it from Weka’s package manager and press the start button.
The Following results will be generated on the Classifier output.
Building a decision tree around the following
We can notice that the classification instances for correct and incorrect classification do not lead us to a firm classification conclusion. The accuracy stands at 50 per cent for each. Therefore we can tune this classification model using KNN and improve its accuracy.
This time, we’ll be choosing the IBK classifier package, which is Weka’s own classification package for KNN using euclidean distance.
It will provide us with the following output.
We can tune the model further by increasing the number of neighbours present to help us improve our model accuracy for correctly classified instances. The model using instance-based learning improves its output every time, keeping in mind the previous memory. We have increased the number of neighbours to 5 this time, for the default value is 1.
Every instance is saved on the result list on the bottom left.
Checking our new output
We can notice that our accuracy seems to be improving over time by tuning the hyperparameter of the number of neighbours present. This can be even further tweaked for better accuracy, and the model can be saved for further use.
In this article, by using Weka Tool and with a click of a few buttons, we have created a simple instance-based learning model that can be further improved and performed data mining operations on it. I recommend trying other similar techniques and exploring the tool even further for its numerous capabilities. Happy Learning!
- Data Mining Using Weka Tutorial
- KNN for Machine Learning
- Download the newest version for Weka without java installation
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.