# Benford’s Law: A Cloak-and-Dagger tool for Data Scientists

Benford’s law, often known as Newcomb-Benford law, is an observation about the frequency distribution of leading digits of unconstrained numeric data in the real world.

The intuition behind the law dates to the 1880s when an American Scientist, Simon Newcomb started to discover a pattern among the log tables. He noticed that people usually have a lot of markings on numbers starting with small digits like 1 and 2. He didn’t research much about his observation, so after 50 years, Benford continued  his research on this phenomenon and found out many interesting things by applying it to populations, length of rivers etc. Let’s see how to apply Benford’s law and what are the possible applications of it.

Figure 1: Probability to follow Benford’s Law

#### THE BELAMY

The above given formula will give you the appropriate likeliness about the occurrence of digits to comply with Benford’s Law. The probabilities are as follows:

If the selected set of first digits do not follow the above probability distribution, then either the dataset is too small, or someone has tried to manipulate it. Even 1% of manipulation in the real data will flag some fraudulent activities as it will violate the probability distribution given above. This law can be applied to anything and everything that is a result of an unconstrained process. Not all set of numbers can be used with Benford’s Law. For example, telephone numbers.

The law can be the first step of filtration of any unconstrained dataset to make sure that the data has not tampered. Below are some of the applications of the Benford’s Law.

• Financial Data:

The financial world relies a lot on the Benford’s law, to identify frauds. It could be applied to loan data, stock prices, tax returns etc. Most of the datasets will follow the probability distribution and if not, either someone has manipulated the data or maybe the dataset is too small.

• Election Data:

You could take the number of votes for a party from different cities and try to compare it to the probability distribution. This could be a good check to understand if the party has tried to buy votes or pressurised people to vote for them.

• Image Forensics:

In times, where tutorials to create a Deep Fake are openly available on the internet, it becomes difficult to rely on evidence when it comes to proving the crime. Benford’s law acts as an amateur filtration step to authenticate the image. For e.g., try taking an image with your phone and apply the Benford’s law to the pixel intensities, you’ll notice the same probability distribution as the law. But if you add a filter to the image and save it, it will violate the law as it’s no longer an original image. The same process can be done to spot fake videos. This filter makes it difficult for amateur defaulters to fool the law.

A researcher in the US, wanted to understand the use of the Benford’s Law so she started looking at the number of friends you have on your Twitter account and also the number of friends your friends have on their account. For e.g., scrapping out the number of friends you have on your account and the number of friends that your friends have on their account. Having done this she found out that most of the people were following the Benford’s law but there were also accounts that didn’t follow the distribution. After having a closer look at those accounts, she understood that those were bots. Carrying on her research she exposed an entire network of bots on Twitter. These bots could be used to manipulate elections and send fraud messages to people.

## Conclusion:

The law is so simple yet very powerful and beholds the ability to spot frauds within seconds. There are a lot of applications of it and researchers are actively looking for the possible applications, but the question is, “Why everything follows the Benford’s law?”. I’ll leave this to you, to explore.

## More Great AIM Stories

### Is Age Discrimination in Tech for Real?

I am a final year Data Science student with good experience in working with startups across India and Australia in the Machine Learning and AI space. I am always in search of tasks that challenge me to broaden my vision and enhance the level of experience. Looking for a full-time position after my graduation in April 2021. Hit me up if you have an opportunity for me.

## AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Top BI tools for Mainframes

Without BI, organisations will not be able to dominate with data-driven decision-making but focus on experiences, intuition, and gut feelings.

### Interview with Alice Wong, Founder and Principal data scientist, Hyperplane Consulting

Some common microaggressions that women face are automatically assuming the male peer or junior teammate is the female’s manager, paying attention to someone’s words only when a male says them, even after a female has already said the exact same thing.

### Why are social media platforms obsessed with NFTs

Meta wants to be the go-to marketplace to buy and sell NFTs.

### These modern researches aim to make AI similar to human intelligence

There have been several types of research in modern AGI focussed on building machines that are capable of behaving like us humans.

### Emerging trends in low-code/no-code platforms in AI

Before launching Amazon Sagemaker Canvas, AWS rolled out two no-code & low-code services.

### How oscillatory activation function overcomes problems with gradient descent and XOR

In this paper, the researchers have discovered and introduced many oscillating functions that could solve the XOR problem with a single neuron.

### Behind NVIDIA’s latest image editing tool called EditGAN

EditGAN allows users to edit desired images with simple commands like drawing without compromising the original imag

### Council Post: How to develop a comprehensive AI governance & ethics function

Though the definitions of AI governance vary, the basic tenet remains the same – building trust in AI systems.

### Meta gives away a free video dataset of 846 hours

The Casual Conversations dataset comprises 846 hours of 45,000 videos, each up to a minute long on average.

### 3D animation using AI: Behind Plask

This free AI-powered 3D animation editor and mocap tool will completely change the way we edit our videos.