MITB Banner

Benford’s Law: A Cloak-and-Dagger tool for Data Scientists

Benford’s law, often known as Newcomb-Benford law, is an observation about the frequency distribution of leading digits of unconstrained numeric data in the real world. 

The intuition behind the law dates to the 1880s when an American Scientist, Simon Newcomb started to discover a pattern among the log tables. He noticed that people usually have a lot of markings on numbers starting with small digits like 1 and 2. He didn’t research much about his observation, so after 50 years, Benford continued  his research on this phenomenon and found out many interesting things by applying it to populations, length of rivers etc. Let’s see how to apply Benford’s law and what are the possible applications of it.

Figure 1: Probability to follow Benford’s Law

The above given formula will give you the appropriate likeliness about the occurrence of digits to comply with Benford’s Law. The probabilities are as follows:

dP(d)
130.1%
217.6%
312.5%
49.7%
57.9%
66.7%
75.8%
85.1%
94.6%

If the selected set of first digits do not follow the above probability distribution, then either the dataset is too small, or someone has tried to manipulate it. Even 1% of manipulation in the real data will flag some fraudulent activities as it will violate the probability distribution given above. This law can be applied to anything and everything that is a result of an unconstrained process. Not all set of numbers can be used with Benford’s Law. For example, telephone numbers. 

The law can be the first step of filtration of any unconstrained dataset to make sure that the data has not tampered. Below are some of the applications of the Benford’s Law.

  • Financial Data:

The financial world relies a lot on the Benford’s law, to identify frauds. It could be applied to loan data, stock prices, tax returns etc. Most of the datasets will follow the probability distribution and if not, either someone has manipulated the data or maybe the dataset is too small.

  • Election Data:

You could take the number of votes for a party from different cities and try to compare it to the probability distribution. This could be a good check to understand if the party has tried to buy votes or pressurised people to vote for them.

  • Image Forensics:

In times, where tutorials to create a Deep Fake are openly available on the internet, it becomes difficult to rely on evidence when it comes to proving the crime. Benford’s law acts as an amateur filtration step to authenticate the image. For e.g., try taking an image with your phone and apply the Benford’s law to the pixel intensities, you’ll notice the same probability distribution as the law. But if you add a filter to the image and save it, it will violate the law as it’s no longer an original image. The same process can be done to spot fake videos. This filter makes it difficult for amateur defaulters to fool the law.

  • Twitter Bot Identification:

A researcher in the US, wanted to understand the use of the Benford’s Law so she started looking at the number of friends you have on your Twitter account and also the number of friends your friends have on their account. For e.g., scrapping out the number of friends you have on your account and the number of friends that your friends have on their account. Having done this she found out that most of the people were following the Benford’s law but there were also accounts that didn’t follow the distribution. After having a closer look at those accounts, she understood that those were bots. Carrying on her research she exposed an entire network of bots on Twitter. These bots could be used to manipulate elections and send fraud messages to people. 

Conclusion: 

The law is so simple yet very powerful and beholds the ability to spot frauds within seconds. There are a lot of applications of it and researchers are actively looking for the possible applications, but the question is, “Why everything follows the Benford’s law?”. I’ll leave this to you, to explore. 

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Rithwik Chhugani

Rithwik Chhugani

I am a final year Data Science student with good experience in working with startups across India and Australia in the Machine Learning and AI space. I am always in search of tasks that challenge me to broaden my vision and enhance the level of experience. Looking for a full-time position after my graduation in April 2021. Hit me up if you have an opportunity for me.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories