(31/12/2020: This article has been updated with statement provided to us by Google’s spokesperson)
An ongoing competition’s data on Kaggle is reportedly exposed. Experts on the platform are calling it a “data breach”.
“I am not sure if it is the right leaderboard, but I am able to see my private leaderboard rank,” wrote one of the users who has also added screenshots in a discussion thread related to a competition titled “RANZCR CLiP” .
“I was able to get it by looking at the network requests and responses on the leaderboard page,” added the user.
At this point Kaggle should make a public statement, demanded Luca Massaron, former top-10 Kaggle master. “Explaining for how long that has been exposed and if that has been exploited by someone. When such a thing happens in a company, we call it a “data breach”, and this is truly a data breach for the importance of the critical information that leaked out without anyone being aware of it.”
“This is an important bug please fix it,” warned Kaggle Master Hiroki Yamamoto. He has also given a quick guide on how to check this:
- Go to Leaderboard
- open Devtool -> Network -> Reload
- XHR only->select getUser request
- result->team, maybe you can see your public/private score and ranking.
Kaggle is yet to release any public statement. However, one of the staffers at Kaggle tried to douse the fire. “Please know we’re aware of this, looking into it, and working on it! It may take a little bit of time for us to release a full statement as we evaluate the full breadth and depth of the issue, including which competitions may have been impacted and the full extent of the impact on leaderboard results. Thanks for your patience! We’ll let you know when we know more,” requested Addison Howard, Project Manager at Kaggle.
“It is a testament to the integrity of the community that the Kaggler who found the bug reported it publicly in the forum.”Martin Henze, Kaggle GM
RANZCR CLiP competition has tasked the participants to detect the presence and position of catheters and lines on chest x-ray using machine learning models and, to categorize a tube that is poorly placed. The prize money is a hefty $50,000 and there are still 3 months to go. Private leaderboards are hidden to so as to discourage the participants from overfitting their models. For Kaggle, this is a huge deal from the information security standpoint and top performers on the platform are already demanding a full statement from Kaggle on how long this bug has been taken advantage of.
On December 29, 2020, a user disclosed a vulnerability whereby a user’s own private leaderboard score and rank was being returned in the HTTP response on certain Kaggle pages. This information was not intended to be accessible prior to the competition deadline due to the advantage it provides in submission selection. The bug was patched approximately five hours after it was disclosed. The bug only made a user’s own private leaderboard score and ranking available to the user—not to others.
An investigation into the origin of the issue revealed this bug may have been live as early as November 18, 2019. Our preliminary findings indicate that this was not widely accessed. We have identified one individual who we believe used this bug to gain an advantage in a past competition and will continue our investigation and take appropriate actions if warranted. Thankfully, we expect the results of the majority of closed competitions to remain unchanged.
We deeply apologize for the loss in trust this causes. There is no sugarcoating how long this issue was live and we are profoundly disappointed this bug slipped past our checks and tests. Your hard work depends on the integrity of our platform and we do not take this responsibility lightly.
Lastly, thank you to @user123454321 for disclosing the issue. In the event you find a security issue on Kaggle, you can report it via https://kaggle.com/contact or https://goo.gl/vulnz. As this is an ongoing situation, we may not be able to field specific questions about the cause of the bug or impacted teams.