Active Hackathon

# Benford’s Law: A Cloak-and-Dagger tool for Data Scientists

Benford’s law, often known as Newcomb-Benford law, is an observation about the frequency distribution of leading digits of unconstrained numeric data in the real world.

The intuition behind the law dates to the 1880s when an American Scientist, Simon Newcomb started to discover a pattern among the log tables. He noticed that people usually have a lot of markings on numbers starting with small digits like 1 and 2. He didn’t research much about his observation, so after 50 years, Benford continued  his research on this phenomenon and found out many interesting things by applying it to populations, length of rivers etc. Let’s see how to apply Benford’s law and what are the possible applications of it.

#### THE BELAMY

Figure 1: Probability to follow Benford’s Law

The above given formula will give you the appropriate likeliness about the occurrence of digits to comply with Benford’s Law. The probabilities are as follows:

If the selected set of first digits do not follow the above probability distribution, then either the dataset is too small, or someone has tried to manipulate it. Even 1% of manipulation in the real data will flag some fraudulent activities as it will violate the probability distribution given above. This law can be applied to anything and everything that is a result of an unconstrained process. Not all set of numbers can be used with Benford’s Law. For example, telephone numbers.

The law can be the first step of filtration of any unconstrained dataset to make sure that the data has not tampered. Below are some of the applications of the Benford’s Law.

• Financial Data:

The financial world relies a lot on the Benford’s law, to identify frauds. It could be applied to loan data, stock prices, tax returns etc. Most of the datasets will follow the probability distribution and if not, either someone has manipulated the data or maybe the dataset is too small.

• Election Data:

You could take the number of votes for a party from different cities and try to compare it to the probability distribution. This could be a good check to understand if the party has tried to buy votes or pressurised people to vote for them.

• Image Forensics:

In times, where tutorials to create a Deep Fake are openly available on the internet, it becomes difficult to rely on evidence when it comes to proving the crime. Benford’s law acts as an amateur filtration step to authenticate the image. For e.g., try taking an image with your phone and apply the Benford’s law to the pixel intensities, you’ll notice the same probability distribution as the law. But if you add a filter to the image and save it, it will violate the law as it’s no longer an original image. The same process can be done to spot fake videos. This filter makes it difficult for amateur defaulters to fool the law.

A researcher in the US, wanted to understand the use of the Benford’s Law so she started looking at the number of friends you have on your Twitter account and also the number of friends your friends have on their account. For e.g., scrapping out the number of friends you have on your account and the number of friends that your friends have on their account. Having done this she found out that most of the people were following the Benford’s law but there were also accounts that didn’t follow the distribution. After having a closer look at those accounts, she understood that those were bots. Carrying on her research she exposed an entire network of bots on Twitter. These bots could be used to manipulate elections and send fraud messages to people.

## Conclusion:

The law is so simple yet very powerful and beholds the ability to spot frauds within seconds. There are a lot of applications of it and researchers are actively looking for the possible applications, but the question is, “Why everything follows the Benford’s law?”. I’ll leave this to you, to explore.

## More Great AIM Stories

### Solving Machine Learning Problems On Kaggle Vs Real Life

I am a final year Data Science student with good experience in working with startups across India and Australia in the Machine Learning and AI space. I am always in search of tasks that challenge me to broaden my vision and enhance the level of experience. Looking for a full-time position after my graduation in April 2021. Hit me up if you have an opportunity for me.

## Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

### Indian IT Finds it Difficult to Sustain Work from Home Any Longer

Hybrid work models provide the best of both worlds and offer the flexibility of remote working/working from home/working from anywhere.

### Engineering Emmys Announced – Who Were The Biggest Winners

Dr. Paul E. Debevec was awarded the Charles F. Jenkins Lifetime Achievement Award.

### How can the Indian Railway benefit from 5G?

Deploying multiple sensors will allow the Railways to monitor tracks, power systems and environmental conditions in real-time.

### Need a Fashion Designer? Just Ask the AI

AI technology has advanced to the level that it can create complicated unique designs

### Does India match up to the USA and China in AI-enabled warfare?

India’s military spending for 2021 was ranked as the third-highest in the world.

### ThoughtWorks Bats Thoughtfully, calls for Leveraging Tech Responsibly

Across the globe, there’s a lot of demand for data mesh, data platforms and modernising data ecosystems.

### The origin of Neo4j

Neo4j has more than 700 employees globally.

### Attention aspiring data scientists and analytics enthusiasts: Genpact is holding a career day in September!

Don’t miss the opportunity to interact with some of the brightest minds in analytics during Genpact’s Analytics Career Day.

### Poll Campaigns Get Interesting with Deepfakes, Chatbots & AI Candidates

The world around politics is changing as people nominate AI bots in elections, deepfake videos are circulated by political parties and AR and 3D holograms get popular in Indian politics.

### Decentralised, Distributed, Transparent: Blockchain to Disrupt Ad Industry

The distributed, decentralised and transparent system of blockchain checks ad frauds and increase ROI