Faker Tutorial, A Python Library To Create Your Own Dataset

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.
faker tutorial

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.

Faker data can be used to tune machine learning models, for stress testing a model, etc. Depending upon your need you can generate data that best fits your demand. Faker data can also be used for learning purposes like performing different operations on different types of data types.

The datasets generated can also be used to tune the machine learning model, validate the model, and to test the model.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

In this article we will:

  1. Explore Faker and its function
  2. Create our own Dataset

Implementation

In order to explore faker we need to install it using pip install faker.

a. Importing useful libraries

We will explore different functions of faker so we will import faker also we will perform some operations on the dataset for which we need to import pandas.

from faker import Faker
import pandas as pd 

b. Using different functions

Now we will explore different functions that are there in the Faker library, for this, we need to initiate the Faker function using a variable.

exp = Faker()

Now we will use this variable to generate different attributes.

print('Name: ', exp.name())
print('Address: ',exp.address())
print('DOB: ',exp.date_of_birth())

We can generate information according to different regions and localities in different languages. We just need to mention the language we want. Let’s generate some data in the Japanese and Hindi language.

exp = Faker(['ja_JP', ‘hi_IN])
for i in range(5):
          print(exp.name())
Faker Tutorial

We can also create our own sentences using the sentence function and text function.

exp.text()

exp.sentence()

We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.

words = ['Hello','Abhishek','all', 'are','where','why',]
exp.sentence(ext_word_list=words)

Other than generating names and addresses, we can generate whole profiles for different persons that do not exist. We will use the profile function to generate a fake profile of a person.

exp.profile()

Faker can also generate the random dataset.

c. Create a fake dataset using faker

Now we will use the profile function and generate a dataset that contains profiles of 100 unique people that are fake. For this, we will also use pandas to store these profiles into a data frame. We will create these profiles in the Hindi language.

exp = Faker(['hi_IN'])
data = [exp.profile() for i in range(100)]
df = pd.DataFrame(data)
df
Faker Tutorial

The dataset we have created contains different attributes like residence, location, website, etc. We can use this dataset according to our needs. 

We have stored these profiles into a data frame so that we can perform operations on it, like Visualization, Analysis, etc.

Conclusion

In this article, we saw how we can use Faker, an open-source python library to generate fake data and how we can create a fake dataset containing profiles of different fake people in different languages, locations, etc. The dataset created can be used for different purposes like training a machine learning model, performing different operations, etc. 

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM