MITB Banner

Faker Tutorial, A Python Library To Create Your Own Dataset

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.
faker tutorial

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.

Faker data can be used to tune machine learning models, for stress testing a model, etc. Depending upon your need you can generate data that best fits your demand. Faker data can also be used for learning purposes like performing different operations on different types of data types.

The datasets generated can also be used to tune the machine learning model, validate the model, and to test the model.

In this article we will:

  1. Explore Faker and its function
  2. Create our own Dataset

Implementation

In order to explore faker we need to install it using pip install faker.

a. Importing useful libraries

We will explore different functions of faker so we will import faker also we will perform some operations on the dataset for which we need to import pandas.

from faker import Faker
import pandas as pd 

b. Using different functions

Now we will explore different functions that are there in the Faker library, for this, we need to initiate the Faker function using a variable.

exp = Faker()

Now we will use this variable to generate different attributes.

print('Name: ', exp.name())
print('Address: ',exp.address())
print('DOB: ',exp.date_of_birth())

We can generate information according to different regions and localities in different languages. We just need to mention the language we want. Let’s generate some data in the Japanese and Hindi language.

exp = Faker(['ja_JP', ‘hi_IN])
for i in range(5):
          print(exp.name())
Faker Tutorial

We can also create our own sentences using the sentence function and text function.

exp.text()

exp.sentence()

We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.

words = ['Hello','Abhishek','all', 'are','where','why',]
exp.sentence(ext_word_list=words)

Other than generating names and addresses, we can generate whole profiles for different persons that do not exist. We will use the profile function to generate a fake profile of a person.

exp.profile()

Faker can also generate the random dataset.

c. Create a fake dataset using faker

Now we will use the profile function and generate a dataset that contains profiles of 100 unique people that are fake. For this, we will also use pandas to store these profiles into a data frame. We will create these profiles in the Hindi language.

exp = Faker(['hi_IN'])
data = [exp.profile() for i in range(100)]
df = pd.DataFrame(data)
df
Faker Tutorial

The dataset we have created contains different attributes like residence, location, website, etc. We can use this dataset according to our needs. 

We have stored these profiles into a data frame so that we can perform operations on it, like Visualization, Analysis, etc.

Conclusion

In this article, we saw how we can use Faker, an open-source python library to generate fake data and how we can create a fake dataset containing profiles of different fake people in different languages, locations, etc. The dataset created can be used for different purposes like training a machine learning model, performing different operations, etc. 

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories