ImageNet Gets A Privacy Overhaul, What About Other Datasets?

Last week, ImageNet, one of the world’s most influential AI datasets, decided to blur the faces of people in its database in an effort to increase user privacy. In another part of the world, researchers at the University Erlangen-Nurnberg in Germany discovered that the X-ray datasets used by AI classification systems were not as anonymous as considered initially. As we move towards a greater dependence on artificial intelligence and machine learning, will it be at the cost of our privacy? Let’s delve deeper into these incidents.                                                                                                                                         

ImageNet’s Privacy Overhaul

ImageNet began in 2009 as a project to compile images to see if the growth of artificial intelligence was held back by the lack of sufficient data. During its inception, ImageNet sourced its categories from WordNet, a database of English words categorised by synonyms. ImageNet leveraged the nouns from WordNet and used them as categories to scrape the internet for images. And to facilitate this, ImageNet used Amazon Mechanical Turk Workers to collect images of thousands of objects and people without their explicit consent. 

In 2012, the ImageNet Large Scale Visual Recognition Challenge was launched. It heralded a new age in the field of AI, spurring on its development by giving companies access to a vast repository for deep learning. Since then, the database has expanded to over 1.5 million images categorised under 1000 words — 17% of which contain human faces, yet only three categories are related to people.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

However, in a recent move, the company announced its decision to blur the faces of people in its images over fears of facial recognition systems being misused, affecting the privacy of the people represented. Moreover, many, if not all, of their faces, were collected without their consent. 

Although the images were public record from social media profiles, the collection of it to train facial recognition algorithms is worryingly unethical. While the researchers behind ImageNet have said that the faces being blurred would not affect object recognition algorithms or benchmarking, there is still the possibility of algorithms that learn facial data from the new blurred images being unable to recognise ‘unblurred’ faces when encountering them.  

Compromised Medical AI Dataset

The advancement of AI has also been intertwined with the field of medicine. Disease classification systems and X-ray analysis databases that are run by artificial intelligence are not uncommon any more. They, however, rely on large datasets for deep learning to ensure accurate diagnoses, and thus, anonymised patient data is often submitted by hospitals. 

However, researchers at the University Erlangen-Nurnberg in Germany have found out that these AI datasets may not be as private as initially thought. They successfully devised a technique using deep learning reidentification, by which more than one X-ray scans can be identified as belonging to a single person with near 96% accuracy. The dataset accessed contains around 112,000 records. Thus, if even one X-ray scan is compromised by malicious attackers, the patient record of the individual linked to the scan can be stolen. 

The X-rays that have common deformities are easier to match, according to the researchers. Even if there are virtually no common markers of identification, the tool can eventually find the match. In such cases, if attackers possess even a partial image of a patient, the reidentification technique can be expanded to access their records across multiple datasets. 

In 2017, over 27% of identity thefts were connected with medical data breaches, and 15 million patient records were breached in 2018. Later, in just the first half of 2019, that number doubled

The Way Forward

While the dataset pioneers like ImageNet have become increasingly aware of the privacy risks they pose, tech giants like Facebook continue to brazenly ignore the same. Researchers at Facebook AI announced that their SEER AI had progressed to the point that it can outperform current models in object recognition tests. However, there is a catch. 

This was achieved by letting the AI scrape through over a billion images of users on Instagram. The only exception was using images from the EU, where the GDPR prevents violations of customer privacy. A few lawsuits have already been filed in the US against IBM’s Diversity in Faces dataset, FaceFirst — a SaaS-based facial surveillance company, and Google.

Therein lies one solution – stricter regulation. Social media advertising algorithms are already facing the heat when it comes to privacy-related threats, and the same should go for AI dataset privacy. 

David B. Shrestha
I have a Bachelor's degree in Journalism, and I enjoy writing about tech policy, cryptocurrency, and the latest advancements in artificial intelligence.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR