Active Hackathon

Application Of Data Science In The School Education Sector

The objective of the study is to employ data science in an attempt to find solutions to some of the hard-pressing and long-pending problems of the school education sector.
Application Of Data Science In The School Education Sector

The objective of the study is to employ data science in an attempt to find solutions to some of the hard-pressing and long-pending problems of the school education sector. Four high priority problems have been chosen, and various machine learning algorithms have been employed to solve the issues. Even though some of the techniques (use of images and data) used for data collection are highly ambitious, we are able to come up with a practical solution for each of the problems and a desired course of action.

The four problems we have analysed are — i) high dropout rate of students at various levels of school education ii) inability of the current education system to identify learning disability in a child iii) one size fits all pedagogy & ineffective traditional method of evaluation and iv) lack of focus on the evaluation of teacher’s performance and teaching methods. The various machine learning algorithms used to solve the problems are classification, clustering and regression. The results show that data science has much scope in finding innovative solutions to many schools and higher education sector issues.


Sign up for your weekly dose of what's up in emerging technology.

The high dropout rate of students at various levels of school education:

A child dropping out of school is a huge waste of resources for our country. Finding an approach to bring down the dropout rate will improve society and the country. With the help of data science, an attempt has been made to predict the students who will drop out in an academic year and also predict the number of students who drop out in a particular school/region, which would, in turn, help the government to deal with the reasons for drop out. 

For the same, data has to be collected on the various factors contributing to dropping out of a student/student. The data needed to predict dropout rates requires having multiple factors like family’s financial status, educational details of the family, family health background, gender of the child, student performance, etc.

A classification algorithm is used to predict whether a student would drop out or not in the current academic year. Possible models are Decision trees, Naïve Bayes ad KNN.

Further, a regression algorithm predicts the number of dropouts in a school/region in the upcoming year. Possible models are linear regression, polynomial regression.

The inability of the current education system to identify learning disability in a child at his early phase:

Learning disability is one of the most ignored, unattended and unpredictable problems in children. This is primarily because of four reasons — lack of awareness of such problems among elders, the inability of a child who has such disability to express his situation properly, lack of appropriate mechanism to observe or measure such disabilities directly in children, and the issue of over-identification in the current method faculties observation and various checklists.

Early identification and intervention are important for the long-term success of individuals with learning disabilities. Early identification includes the evaluation and counselling provided to families and their children under three years old who have or are at risk of having a disability. 

Some of the learning disabilities are — Dyscalculia, Dysgraphia, Dyslexia, Non-Verbal Learning Disabilities, Oral / Written Language Disorder and Specific Reading Comprehension Deficit, ADHD, Dyspraxia. Using a multiclass classification algorithm, students can be classified into different disability classes. Some examples of classes are — Dyslexia, Dysgraphia, Dyscalculia etc. 

Other challenges include —high computational power required to model the data collected and analysing data containing images and videos by applying complex algorithms and models which are not transparent. Since the school authorities are answerable to the parents if their wards are detected with learning disabilities, lack of transparency poses an issue.

One size fits all pedagogy & ineffective traditional method of evaluation:

Some of the flaws of the current education system in pedagogy and evaluation are — evaluation of a student solely based on examination marks, lack of methods to identify curious and innovative minds, least weightage for extra-curricular activities and soft skills, and the inability to use teaching methods based on student’s aptitude and interest or student’s strength and weakness. All of this has put an unnecessary burden on students’ shoulders, where students only focus on scoring good marks by any means and miss out on the primary aim of education. 

The aim of the education system should be the development of student’s skills, aptitude, knowledge and perception, all combined in a single frame. The New Education Policy 2020 emphasises a shift from mark-based evaluation to ‘Continuous and comprehensive evaluation’. Also, a lot of focus is being given to critical thinking and more holistic, inquiry-based, discovery-based, discussion-based, and analysis-based learning. The policy also emphasises building character and creating holistic and well-rounded individuals. Keeping this as the context, the role data science can play in improving student evaluation and thereby designing student tailored pedagogy is being attempted in this paper. 

Instead of evaluating a subject as a whole in a written examination, each subject is evaluated on multiple parameters, which are done regularly. The evaluation is done on a scale of 1-5 based on a scientifically prepared evaluation rubric. The data is collected from written tests, practical sessions, projects, group discussions, class participation etc. To evaluate the students on their behaviour, the same approach as mentioned above is followed. The parameters are evaluated by teachers on a scale of 1-10 based on an evaluation rubric. The various sources of data collection are group activities, sports, debates etc. A clustering algorithm is used for both academic and behaviour evaluation.

Lack of focus on the evaluation of teacher’s performance and teaching methods:

We find that today, as never before, teachers are dangerously overloaded. Their traditional functions of instruction, socialisation, evaluation and classroom management are not sufficient to make them effective. Moreover, the present time possesses challenges that traditional school teachers never faced. They are facing a flux in the educational scenario, which contains innumerable and complex situations. 

The key problems are — evaluation of teachers solely based on problem-solving skills. In addition, during teacher evaluation, experience is given more weightage without considering if there is any self-improvement or not, and no evaluation is done on teachers’ character traits.

Teacher evaluations are often designed to serve two purposes: to measure teacher competence and to foster professional development and growth. In addition, a teacher evaluation system will give teachers useful feedback on classroom needs and allow them to learn new teaching techniques to make changes in their classrooms accordingly. Thus, the purpose of teacher evaluation in bringing about change for betterment is now widely recognised and accepted. The main aim of the evaluation is continuous improvement of the educational scenario. 

Using a multiclass classification algorithm, a teacher’s performance is classified into different classes. Some examples of classes are — excellent, good, moderate, poor, and very poor.


In this paper, we have employed data science to try and find solutions to four of the key problems we have identified from the education sector. For example, we could see that — if multiple factors leading to school dropout can be identified and the data can be collected using an appropriate mechanism, students who have a high probability of being dropped out in an academic year can be predicted. Similarly, a learning disability in a student can be correctly identified if the appropriate machine learning algorithm is used. Even though the evaluation of teachers and students using data science has certain limitations, further study in the area and use of more advanced and scientific methods of data collection could bring in more insight into solving those problems. Through this paper, we have attempted to showcase the potential of data science in solving some of the issues in the education sector and this need to be treated as a precursor to more in-depth studies that would happen shortly in the area of education.


Group Details

Yash Aswani

Yash is pursuing a nine-month Data Science Program (PGPDS, Jan 2021 Batch) from Praxis Business School. He has three and a half years of work experience in 63 Moons Technologies as a Software Auditor and has completed his bachelor’s degree in engineering.
Geetha Joseph

Geetha is pursuing a nine-month Data Science Program (PGPDS, Jan 2021 Batch) from Praxis Business School. She has completed her bachelor’s degree in engineering with two years of work experience in TCS as an Assistant Systems Engineer. She has also done a PGDM in Finance from XIME, Bangalore. Out of my passion for teaching students, she has volunteered as a teacher in a school for a year (2018-2019).
Shahrukh Gouhar

Shahrukh is a student of the Post Graduate Program in Data Science at Praxis Business School, Jan 2021 batch. He completed his bachelor’s degree in Mechanical Engineering from Jadavpur University in the year 2018. He also has a year of experience working in TIL Ltd. as a Graduate Engineer Trainee.
Syed Mirak Wajahat Kirmani

Syed is a student of Praxis Business School currently pursuing a nine-month Data Science Program (PGPDS, Jan 2021 Batch). He has completed his graduation in computer applications (BCA) from Kashmir University. His family has been in the education sector for generations, and the passion for making the education system better has been inbuilt in him.

More Great AIM Stories

Praxis Business School
Praxis is driven by the purpose of creating resources that will lead India’s transformation into the digital world. Praxis offers a transformational learning experience and exciting career opportunities to its students across its postgraduate programs in Management, Data Science, Cybersecurity and Data Engineering.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM