Interpretable Student Performance Predictions

This is one of the top voted thesis papers from upGrad's online working professional programs in partnership with one of the UK's leading universities.
Student Performance Prediction

Digital transformation has taken over the world in the last few years. From Artificial Intelligence (AI) to Big Data, digital technologies play an essential role in everyday lives. With all of the technological growth, tons of data are generated each second. This data has beneficial results and actionable insights for economies, governments, and societies. It is currently used extensively in retail, energy, banking, and finance, among other industries.

One key industry that can benefit from this data revolution is Education. This sector occupies a place of revered importance even today. Data science, particularly AI, can be used to automate repetitive tasks or create interactive studying aids. There are many more uses, one of which is discussed in this article in detail – interpreting student performance predictions. These performance predictions are specifically helpful for educators as well as policy formulators.


Sign up for your weekly dose of what's up in emerging technology.

Data Analytics and Mining in Education

Various data analytics techniques are used to gather and interpret insights from educational information. However, most prediction models face a significant problem – they can’t explain ‘why’ they arrived at a particular prediction and ‘how’ they did it. 

Figuring out the interpretation capabilities of Artificial Intelligence (AI) models is essential when researchers deal with complex and massive datasets. Analytics and insights need substantial proof or backing, and if the models can’t provide it, the levels of human trust in AI become lower. This research project aims to change this perception of AI in education by proposing a new prediction model using existing technologies. 

Recent digital progress has encouraged the adoption of AI in data mining. Data mining is useful to extract relevant data and identify patterns to provide actionable insights. In the education domain, the process is called Educational Data Mining (EDM). It uses tools and techniques to gather information from records, logs and examination results. It also follows various dimensions such as student’s performance, dropout rates, individual intelligence and cognizance, teacher’s/administrator’s performance and more. 

But like any other technology, EDM is still an emerging discipline. The International Educational Data Mining Society agrees. Considering the definition that society provides for EDM, methods to explore educational data and use them for student betterment are still developing. This project is a step forward in improving EDM, expanding on technological progress. 

What’s progress without Challenges?

Past data and literary works were extensively covered for this research. The information learned through them provided a set of challenges.

1.    Black box approach

When dealing with sensitive information, AI adopts a black box approach. It shares results, but the end-user doesn’t know ‘how’ and ‘why’ it arrived at the results. This approach is still prevalent in most industries including Defense, Health, Autonomous Vehicles and the likes.

While the black box approach is changing today, several industries are still wary of previous opacity in results. For models and their interpreted results to be acceptable, they need to be more transparent. The lack of confidence in black box AI is still present in the educational field, which poses challenges. 

2.    Less trustworthy model prediction

Every research model has to pass a certain confidence level. The confidence level is arrived at after extensive testing and validating data. However, not everyone believes that all confidence levels are trustworthy. 

For a model to succeed, it’s crucial to provide unbiased results. If the confidence levels are not right, the model will be less trustworthy and not have solid reasoning for its results.

3.    Gaps in past research

In a field that’s continuously developing, getting accurate data to support research is difficult and time-consuming. The project massively covered past data in the education sector, focusing on predictions for student performance using AI or Machine Learning (ML) models. But none of these works achieved interpretable and trustworthy model predictions. 

Moreover, the techniques used still had undefined aspects such as their plot delivery, intuitiveness, accuracy and trustworthiness. Further research is necessary to support the model proposed in this project. 

Construction of the Model

The prediction model for this project was proposed by considering students’ behavioral, academic and demographic feature attributes. Data mining extracted these attributes from a learning management system.

Common EDM techniques use popular and performant algorithms such as Logistic Regression, Decision Trees, Support Vector Machines, Artificial Neural Networks and Random Forest. However, more often than not, these seem opaque. Therefore, this research focused on building a model that:

●      Provided insights on features and the reasoning behind the ones selected

●      Used interpretable AI to identify the marginal contribution and correlation of the various features 

●      Analyzed and understood contributing factors for classification or misclassification of an instance 

The model was examined through a set of classifiers – XGBoost, Naïve Bayesian and Random Forest – but achieving credible confidence levels was challenging. To better the confidence level and trustworthiness, Explainable Artificial Intelligence (XAI) methods which gave the model interpretability, were used. By sharing results that show both advantages and limitations, XAI improved the trustworthiness of AI in education. SHAP, the XAI method used for this research, addressed the problem of non-interpretable results and black box approaches followed by previous prediction models. 

To build the model, the project undertook a comprehensive comparative analysis of highly performant models. Out of all, Random ForestXGBoost and Naïve Bayes classification algorithms were found best suitable. Data visualization and feature selection techniques further helped to eliminate irrelevant features. Feature elimination was done by ranking and using an evidence-based approach. For the model to be adopted on a larger scale, making the process as transparent as possible by sharing interpretable insights was vital. 

As a result, the model presented key observations of experiments by denoting the importance of features and using multiple approaches based on Random Forest and XGBoost. It also covered feature comparisons to rank them based on importance. All the while, it provided interpretable results to make the model trustworthy and credible. 

Key Outcomes of the Model

New technologies, particularly AI and ML, are changing the way the world functions. However, due to a previously rampant black box approach, human faith in these tools was low. This model aims to change perspectives and throw light on a new direction for AI in education. 

Some of the key outcomes were:

  • Deriving interpretable results for humans was becoming increasingly important. Business decision-makers wanted to feel empowered in their capabilities, but unsupportive AI didn’t help. With the new model employing XAI and sharing interpretable results, decision making will become an empowering and data-driven process. 

In other places, GDPR policies and country-specific data regulations have pushed organizations to use XAI, so it aligns with privacy guidelines.

  • In all of the previous literature work covered, an interpretable model in the education domain was missing. This research establishes a new model based on SHAP XAI, the first step to restoring trust in AI in the field. With SHAP XAI, stakeholders in the education industry can use data backed by reasoning for better policy making and improving students’ performance. 
  • The model has generated proven results through consistent testing and applications to various black box models. As a next step, a scalable interpretable and explainable model with its efficacy will boost adoption among stakeholders. 


Manjari Chitti is an upGrad learner, and as a part of her program, she has developed the thesis report titled — Interpretable Student Performance Predictions.

More Great AIM Stories

Manjari Chitti Chitti
An Enterprise Architect and an avid AI enthusiast, Manjari Chitti is focused on applying cognitive technologies to businesses across industry domains. Manjari holds a Masters’ Degree in AI and ML from Liverpool John Moores University and has over 15 years of experience in the IT industry. Manjari is also an upGrad learner and as a part of her program, she has developed the thesis report titled, Interpretable Student Performance Predictions.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM