“The IB’s model has glaring methodological issues and completely disregards the ethical considerations which should accompany its adoption.”
The International Baccalaureate (IB), which has a global presence and whose programmes are followed by more than 5,000 schools in 158 countries, has now made amendments to their grading process by using a statistical model to award grades to the May 2020 Diploma Programme. So, how fair is it to use a statistical model to decide the future of the students?
Condemning the adoption of this approach, a Data Scientist from Berkeley, wrote a detailed blog post explaining the perils of haphazard practices employed by the reputed organisation.
Analytics India Magazine got in touch with the blogger of positivelysemidefinite to know more. Although, the blogger admits that he is no expert on pedagogical practices, but this decision by IB is an absolutely terrible idea to make everything proprietary with no scope for oversight. He says that IB should acknowledge the cases in which their model fails systematically and allow for greater leeway of appeals in those cases. “I think that such standard disclosures should be a part of any machine learning model in sensitive domains,” said the blogger.
What’s Wrong With IB’s Approach
“A model can choose to assign female students to systematically lower grades in STEM subjects and/or incorrectly fail Black students at higher rates than Asian students.”
Biases in algorithmic-based solutions are still an ongoing problem. Last month, we witnessed the Twitter meltdown, which was sparked by the tweet of Turing Awardee Yann Lecun. However, there probably hasn’t been such a large scale experiment like that of IB’s so far, where flaws in the model can put 160,000 people’s future at stake.
Due to curriculum disruptions, the IB has been forced to cancel final exams for its current student cohort and use a model instead, to assign final grades.
“Together we have developed a method that uses data, both historical and from the present session, to arrive at the subject grades for each student,” said IB, backing up their decision.
The three-step process, illustrated below, will be used to prescribe final grades to each student. I will refer to this entire process as the ‘model’.
Addressing the role of historical bias in the grading practices, the blog states that the secondary school teachers, according to a study done by National Center for Education Statistics, tend to express lower predictions for their ‘expectations from students of colour and students from disadvantaged backgrounds’. This is problematic because predicted grades play a prominent role in the model.
“How ethical is it to tell a 17-year-old kid that they were unable to graduate with their peers because their inaccurate prediction was ‘the cost of doing business’?”
Regarding the reliability of the model, the blog raises the following concerns:
- How was the error adjusted for classrooms with varying capacities?
- Was the socioeconomic status of the schools considered?
- What about the schools with new teacher recruitments within the academic year? Will the historical relationship match the current relationship?
These queries barely scratch the surface. “There are many other nuanced problems which may arise depending on the sort of model that the IB decides to use,” lamented the blogger. He also believes IB has possibly overlooked ethical considerations while making its operational choices.
“ IB should seriously re-evaluate the manner in which it has chosen to manoeuvre this very delicate situation.”
To demonstrate the dire nature of IB’s decision, a model was built that simulates the approaches of IB. This model, which was unaware of any data about the race, socioeconomic status or gender of the student body in any high school, predicted the majority race(black/Hispanic) of the high school with higher accuracy than that of the graduation rate!
The blogger warns that one can be fooled into thinking that a model which isn’t aware of gender/race/socioeconomic status cannot possibly discriminate based on these attributes; known as ‘fairness through unawareness’.
Talking about the effectiveness of the new tool, he says that historically, predicted grades never had an impact on final grades. They were never factored into final grades. Final grades were always based on an evaluation of anonymous-randomly assigned exams.
“Open source the model & results on the extent of the bias.”
We all agree upon the fact that humans are biased. However, most of us try to practice sensibility in matters of delicate nature. But, when a mathematical model is tasked with something as critical as the careers of students, we expect it to be perfect. Those who deploy these models cannot direct the blame towards a model’s lacklustre performance. If that is the case, then the approach should at least be made transparent so that concerned parties would know what they are getting into.
“What was the point of fitting a 2-dimensional model altogether? They should have simply based the grades on submitted coursework with some additional processes to appeal the grades,” asks the blogger. That said, he agrees that the IB is a good organisation with good intentions. However, their decision, he thinks, was definitely shortsighted and completely disregards the ethical considerations.
Update: IB’s Statement
International Baccalaureate contacted Analytics India Magazine to respond with regards to the criticism in this article. Here’s their statement:
“The decision to cancel the May 2020 examinations due to the COVID-19 pandemic was incredibly difficult, and as the IB responds to these exceptional circumstances, it has endeavoured to be as transparent as possible. The final grades for the May 2020 DP and CP session are based on the student’s coursework throughout the two-year programmes, predicted grades provided by schools, and historic assessment data. For the subjects where students would normally sit exams, historic data was analysed to determine the global relationship between coursework marks, predicted grades and final subject marks. The IB has applied this calculation to determine the final grade for the May 2020 DP and CP session. Prior to the attribution of final grades, this process was subjected to rigorous testing by educational statistical specialists to ensure our methods were robust. It was also checked against the last five years’ sets of results data, to ensure that it would provide reliable and valid grades for students. The stability of results for students has been maintained for the May 2020 session. The mean total points for May 2020 DP students show small increases in the average grade achieved compared to previous years. The grade distribution level is also in line with the previous four years of results data. The IB is confident that it has awarded grades in the fairest and most robust way possible in the absence of examinations, and the grades awarded to students are of equal value to those awarded in any other year. This level of confidence means that the DP and CP certification documents awarded to students for the May 2020 session will be the same as any other session. IB World Schools can request re-marks of students’ work in the May 2020 session through the Enquiry Upon Results (EUR) services. There are some changes to the EUR services in this exceptional session due to the fact that marks have been calculated in the absence of examinations in some subjects, and these have also been communicated to IB World Schools.”