Council Post: Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

The increased availability of data has transformed the way companies work. Building data-driven strategies has now become a key differentiator. Organisations leverage data from their platform and multiple outside sources to build models that make predictions and inferences. 

When it comes to data, the issue of privacy cannot be neglected. Data privacy doesn’t solely concern the direct extraction of information but also with the inferential compromise of personal or sensitive information. With increased accessibility to technology, democratisation of software, and more importantly, with rising cloud migrations, our digital footprints are no longer in our control. 

As users do not have full transparency about how their data is being collected, how it will be used, what kind of inferences would be concluded, how the decisions made on these inferences would impact them, and where all is the data being shared, data privacy has become an institutional responsibility and governance issue.    


Sign up for your weekly dose of what's up in emerging technology.

Data collection and utility

Decision making is a tricky yet critical task. Data is often key to making these business decisions. Data analysis and management have become a vital vertical for businesses. It helps in the general management of daily operations, improvement and growth, and identifying variables to build future strategies.

Data is used for improving customer experience. Experts believe data analytics is the most important emerging technology for enhancing customer experience and that it should become an integral part of customer experience strategy. Data could also be a differentiating factor in marketing. Data-driven marketers use customer data to predict their needs and future behaviours, which helps in developing personalised marketing strategies for high RoI.

Download our Mobile App

The data that is collected can be classified in four categories:

  • Personal data: This category includes personally identifiable information such as social security numbers and gender as well as non-personally identifiable information, including your IP address, web browser cookies, and device IDs (which both your laptop and mobile device have).
  • Engagement data: This type of data details how consumers interact with a business’s website, mobile apps, social media pages, emails, paid ads and customer service routes.
  • Behavioral data: This category includes transactional details such as purchase histories, product usage information (e.g., repeated actions), and qualitative data (e.g., mouse movement information).
  • Attitudinal data: This data type encompasses metrics on consumer satisfaction, purchase criteria, product desirability and more. 

Why do we need data privacy

For many companies, data sharing, interlinking, and mining are essential parts of the business model. On one hand, insights derived from rich data becomes the driving force behind a company’s growth by allowing access to not only data that discloses individual user behaviors, social interactions, and spending, but also to extremely sensitive information with details on geographic location, health care records, day to day routines, etc. The insights derived via mining user preferences are used to provide recommendations and enhance user experiences. 

That being said, users have at times, been exposed to unwanted personalised advertising, data sharing to third parties, and other privacy breaches. There is a need for a fine balance between monetising data while respecting and safeguarding user privacy. But, what is data privacy, and why is there a need for preserving it?

As the name suggests, data privacy refers to the proper handling of data that is in compliance with the data protection guidelines. It covers aspects of data collection, storage, management, and sharing. 

Europe’s General Data Protection Regulation (GDPR) is the toughest privacy law in the world. This is a far-reaching and large regulation that imposes restrictions on organisations that target or collect data in the European Union. Harsh fines are levied on firms that fail to comply with these privacy and security standards.

Another regulation worth mentioning is the California Consumer Privacy Act of 2018 that offers customers better control over the personal data collected by the businesses. This entails the right to know how businesses are collecting this data, using and sharing it; the right to delete such data; to opt-out of the sale of their personal data etc.

Federated learning and data privacy

Federated Learning, also known as collaborative learning, is a deep learning technique where the training takes place across multiple decentralized edge devices (clients) or servers on their personal data, without sharing the data with a central server, thus keeping the data private. It aims to train a machine learning algorithm, say, deep neural networks on multiple devices (clients) with local datasets without explicitly exchanging the data samples.

One of the defining characteristics of federated learning is that it keeps raw data decentralised, train model decentralised and then aggregate. Unlike traditional data centre-based distributed learning settings where data is arbitrarily distributed and any node within the network can access the data, Federated Learning involves heterogeneous distributed data to help protect privacy.

Credit: Author

Let’s explain the concept better using a real-world use case of anti-money laundering activities in banks. Traditionally, banks use rule-based models to filter obvious non-money laundering records and manually review the rest. However, determining if a transaction record is money laundering activity is tedious and prone to errors. Another problem with traditional methods is that there is a lack of insights in unknown cases. Therefore, using a machine learning model can be a possible solution for finding hidden associations among all features. Different organisations would be encountering different kinds of cases, and would be a data rich solution if the model is built collaboratively. Federated learning enables multiple institutions to build a common model without sharing their data physically. All the banks provide a homogeneous type of data which means they have the same features with different sample ids.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Aishwarya Srinivasan
Aishwarya is an AI & ML Innovation Leader at IBM. She works cross-functionally with the product team, data science team and sales to research AI use-cases for clients by conducting discovery workshops and building assets to showcase the business value of the technology. Aishwarya has founded a nonprofit organisation Illuminate AI that aims to provide mentorship, career guidance, and educational support to thousands in the community. She is also a board member for nonprofit organisations like AI for Good Foundation and AI Education project.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox