MITB Banner

Council Post: Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

One of the defining characteristics of federated learning is that it keeps raw data decentralised, train model decentralised and then aggregate. Unlike traditional data centre-based distributed learning settings where data is arbitrarily distributed and any node within the network can access the data, Federated Learning involves heterogeneous distributed data to help protect privacy.

Share

Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

Illustration by Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

The increased availability of data has transformed the way companies work. Building data-driven strategies has now become a key differentiator. Organisations leverage data from their platform and multiple outside sources to build models that make predictions and inferences. 

When it comes to data, the issue of privacy cannot be neglected. Data privacy doesn’t solely concern the direct extraction of information but also with the inferential compromise of personal or sensitive information. With increased accessibility to technology, democratisation of software, and more importantly, with rising cloud migrations, our digital footprints are no longer in our control. 

As users do not have full transparency about how their data is being collected, how it will be used, what kind of inferences would be concluded, how the decisions made on these inferences would impact them, and where all is the data being shared, data privacy has become an institutional responsibility and governance issue.    

Data collection and utility

Decision making is a tricky yet critical task. Data is often key to making these business decisions. Data analysis and management have become a vital vertical for businesses. It helps in the general management of daily operations, improvement and growth, and identifying variables to build future strategies.

Data is used for improving customer experience. Experts believe data analytics is the most important emerging technology for enhancing customer experience and that it should become an integral part of customer experience strategy. Data could also be a differentiating factor in marketing. Data-driven marketers use customer data to predict their needs and future behaviours, which helps in developing personalised marketing strategies for high RoI.

The data that is collected can be classified in four categories:

  • Personal data: This category includes personally identifiable information such as social security numbers and gender as well as non-personally identifiable information, including your IP address, web browser cookies, and device IDs (which both your laptop and mobile device have).
  • Engagement data: This type of data details how consumers interact with a business’s website, mobile apps, social media pages, emails, paid ads and customer service routes.
  • Behavioral data: This category includes transactional details such as purchase histories, product usage information (e.g., repeated actions), and qualitative data (e.g., mouse movement information).
  • Attitudinal data: This data type encompasses metrics on consumer satisfaction, purchase criteria, product desirability and more. 

Why do we need data privacy

For many companies, data sharing, interlinking, and mining are essential parts of the business model. On one hand, insights derived from rich data becomes the driving force behind a company’s growth by allowing access to not only data that discloses individual user behaviors, social interactions, and spending, but also to extremely sensitive information with details on geographic location, health care records, day to day routines, etc. The insights derived via mining user preferences are used to provide recommendations and enhance user experiences. 

That being said, users have at times, been exposed to unwanted personalised advertising, data sharing to third parties, and other privacy breaches. There is a need for a fine balance between monetising data while respecting and safeguarding user privacy. But, what is data privacy, and why is there a need for preserving it?

As the name suggests, data privacy refers to the proper handling of data that is in compliance with the data protection guidelines. It covers aspects of data collection, storage, management, and sharing. 

Europe’s General Data Protection Regulation (GDPR) is the toughest privacy law in the world. This is a far-reaching and large regulation that imposes restrictions on organisations that target or collect data in the European Union. Harsh fines are levied on firms that fail to comply with these privacy and security standards.

Another regulation worth mentioning is the California Consumer Privacy Act of 2018 that offers customers better control over the personal data collected by the businesses. This entails the right to know how businesses are collecting this data, using and sharing it; the right to delete such data; to opt-out of the sale of their personal data etc.

Federated learning and data privacy

Federated Learning, also known as collaborative learning, is a deep learning technique where the training takes place across multiple decentralized edge devices (clients) or servers on their personal data, without sharing the data with a central server, thus keeping the data private. It aims to train a machine learning algorithm, say, deep neural networks on multiple devices (clients) with local datasets without explicitly exchanging the data samples.

One of the defining characteristics of federated learning is that it keeps raw data decentralised, train model decentralised and then aggregate. Unlike traditional data centre-based distributed learning settings where data is arbitrarily distributed and any node within the network can access the data, Federated Learning involves heterogeneous distributed data to help protect privacy.

Credit: Author

Let’s explain the concept better using a real-world use case of anti-money laundering activities in banks. Traditionally, banks use rule-based models to filter obvious non-money laundering records and manually review the rest. However, determining if a transaction record is money laundering activity is tedious and prone to errors. Another problem with traditional methods is that there is a lack of insights in unknown cases. Therefore, using a machine learning model can be a possible solution for finding hidden associations among all features. Different organisations would be encountering different kinds of cases, and would be a data rich solution if the model is built collaboratively. Federated learning enables multiple institutions to build a common model without sharing their data physically. All the banks provide a homogeneous type of data which means they have the same features with different sample ids.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.

Share
Picture of Aishwarya Srinivasan

Aishwarya Srinivasan

Aishwarya is an AI & ML Innovation Leader at IBM. She works cross-functionally with the product team, data science team and sales to research AI use-cases for clients by conducting discovery workshops and building assets to showcase the business value of the technology. Aishwarya has founded a nonprofit organisation Illuminate AI that aims to provide mentorship, career guidance, and educational support to thousands in the community. She is also a board member for nonprofit organisations like AI for Good Foundation and AI Education project.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.