Considering the vast amount of data present, various insights into the data have led to data science professionals to get information about human behaviour, and that’s where the question of ethics arises. As data science’s focal point is analysing the data generated by humans, how data scientists use it should come under certain ethical conditions.
Today’s era offers a great deal of ease when it comes to analytics, which completely changes the ethical framework. The scalability of the data also raises several ethical issues, especially with the companies who use data for monetising it. With the existing ethical frameworks, it is not clear about what to follow, but some basic principles should always be addressed when looking at data science ethics:
The Autonomous Paradigm
With advancements like robotics and automation, the gap between humans and artificial intelligence is closing. Over time, these sophisticated algorithms which use data might soon be able to overtake human decision making.
Just to put in perspective, for example — an autonomous vehicle is supposed to make the lives of human beings safer by reducing traffic and accidents. But, what will happen in an unlikely event where an accident takes place involving autonomous vehicles? Who will be held accountable for the machine’s autonomous decision making in an emergency?
Here, data science personnel may be involved in building the model to steer the machine, leaving out the implementation part. And, therefore, closing the gap between human beings and artificial intelligence raises many several ethical questions.
The Age-Old Question Of Privacy
Perhaps, data privacy concerns and the queries around it have been given the most attention when it comes to ethics and data science. Nothing in today’s world receives more concern than privacy. The digital transformation has changed the way we interact with it. The interactions we have online whether it be social media or something related to other aspects of life are increasingly defining what we think, what we do and most importantly, who we are. In this case, the companies which gather this data have to be responsible for its usage.
With data being the new currency in the world, an organisation which works with data has to be able to answer questions related to ethics like who owns the data? To what extent will the personal data be used? Who will be controlling the data? To what extent should the data controllers be held responsible for the misuse or loss of the data?
“Our information is being weaponised against us with military efficiency. Every day, billions of dollars change hands, and countless decisions are made based on our likes and dislikes, our friends and families, our relationships and conversations, our wishes and fears, our hopes and dreams. These scraps of data, each one harmless enough on its own, are carefully assembled, synthesised, traded and sold.”
– Tim Cook
One such scary example where the data privacy by far is the prime concern is healthcare. The public trust took a significant hit when the news broke out that Google’s Nightingale had access to healthcare information, including names and other data of millions of people without their knowledge. Google did release a statement saying that they were well within the regulations. Still, news like this isn’t something that can be let go so smoothly, especially when it is the health-related data of so many people, which will give the understanding of the person at an intimate level.
Micro-Targeting
The data analytics can be used to focus on the pattern behaviours like beliefs, values and attitudes of a mass on a larger scale. Micro-targeting has the potential to influence a certain mass in the fields of marketing, politics and economics.
What is tricky about micro-targeting is that it can be used to influence an individual’s free choice. The practice of feeding consumers only information they agree with or showing specific information targeting specific groups of people can lead to manipulation. The real danger comes when data science is used less to improve an organisation’ organisation’s services and more towards making the client a product and an object of manipulation.
The most significant example when it comes to this is the Cambridge Analytica using Facebook data for targeting a specific group of the public with the entire purpose of influencing the US elections in 2016. Later, Facebook disagreed with sharing the data with third parties.
A data scientist’s role should be at least to warn of the possible misuse. The person should know and should be communicated with the dangers of micro-targeting for society.
Discrimination and Bias
The bias issue is one where the data science knowledge is perhaps very important for understanding the problem. While talking about algorithms like pattern recognition and similar kinds, they seem to show a certain prejudice.
The goal of an algorithm is to predict the future behaviours of a person or an object according to previously observed behaviour data. Then measure characteristics of this person according to others’ characteristics in the group.
In most cases, there is more negative impact rather than a possible benefit — for example, COMPAS (Correctional Offender Management Profiling for Alternative Sanctions). COMPAS, a software used in the US judicial system to classify the defendant’s risk of committing more crimes showed that it gave better chances to the white coloured group of people for not committing a crime than the darker-skinned group.
The implication of such algorithmic biases leads to danger to people’s lives. Technically, there is no clear solution to this type of problem. Monitoring the type of data that is being fed to the algorithm is one way, but how the system uses this data is much of a black box. This can only be prevented by morality and law.
Distributed Ledgers and Encryption
There is declining trust between consumers and the intermediaries who capture, collect and monetise their data. With the advent of the internet, the distrust is growing even more.
Blockchain and data encryption in machine learning offer some support to make the data more private, traceable and transparent. Blockchain makes the data more traceable and transparent in particular. Others like Julia using Homomorphic Encryption, promise more data security for organisations. The only problem is, these are relatively new technologies and questions whether these might be trusted arise in one’s mind, or their scalability might be questioned.
“There’s another attack on our civil liberties that we see heating up every day — it’s the battle over encryption. Some in Washington are hoping to undermine the ability of ordinary citizens to encrypt their data.”
– Tim Cook
Data Curation
When it comes to data curation, the data scientists must assume the potential biases and abnormalities in the algorithms. Much of the algorithms can’t be improved because of the black-box nature. These algorithms either have to be taken offline or completely replaced.
One such example is Tay, Microsoft twitter bot. Tay was implemented on Twitter as a regular user with an intention for the bot to perform normal user functions on Twitter. But soon, the algorithm learned in such a way that in less than a day, it started spurting out fascistic and racist paroles. The biggest reason was that people on Twitter had learned to manipulate it that way, and there was no curation.
Ethical Review and Adhering to Legal Frameworks
The products and research should be subjected to internal and external ethical review. The organisations should prioritise establishing an actionable ethical review for new practices and services and research. Also, the data scientists should adhere to legal frameworks like GDPR (Europe), The Personal Data Protection Act, 2018 (India) and The California Consumer Privacy Act of 2018 (California) specifically entail the rights of citizens and address the dangers of the commercial use of their data.
These frameworks try to balance power and influence between the private organisations and individuals. These frameworks give ethical benchmarks like the right to object, the right to access, the right to be forgotten and the right to rectification. Importance of such legislations extend beyond data privacy; this Government legislation can be taken to accurately define and form data practices to respond to many ethical questions.