Faker Tutorial, A Python Library To Create Your Own Dataset

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.
faker tutorial

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.

Faker data can be used to tune machine learning models, for stress testing a model, etc. Depending upon your need you can generate data that best fits your demand. Faker data can also be used for learning purposes like performing different operations on different types of data types.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The datasets generated can also be used to tune the machine learning model, validate the model, and to test the model.

In this article we will:

  1. Explore Faker and its function
  2. Create our own Dataset

Implementation

In order to explore faker we need to install it using pip install faker.

a. Importing useful libraries

We will explore different functions of faker so we will import faker also we will perform some operations on the dataset for which we need to import pandas.

from faker import Faker
import pandas as pd 

b. Using different functions

Now we will explore different functions that are there in the Faker library, for this, we need to initiate the Faker function using a variable.

exp = Faker()

Now we will use this variable to generate different attributes.

print('Name: ', exp.name())
print('Address: ',exp.address())
print('DOB: ',exp.date_of_birth())

We can generate information according to different regions and localities in different languages. We just need to mention the language we want. Let’s generate some data in the Japanese and Hindi language.

exp = Faker(['ja_JP', ‘hi_IN])
for i in range(5):
          print(exp.name())
Faker Tutorial

We can also create our own sentences using the sentence function and text function.

exp.text()

exp.sentence()

We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.

words = ['Hello','Abhishek','all', 'are','where','why',]
exp.sentence(ext_word_list=words)

Other than generating names and addresses, we can generate whole profiles for different persons that do not exist. We will use the profile function to generate a fake profile of a person.

exp.profile()

Faker can also generate the random dataset.

c. Create a fake dataset using faker

Now we will use the profile function and generate a dataset that contains profiles of 100 unique people that are fake. For this, we will also use pandas to store these profiles into a data frame. We will create these profiles in the Hindi language.

exp = Faker(['hi_IN'])
data = [exp.profile() for i in range(100)]
df = pd.DataFrame(data)
df
Faker Tutorial

The dataset we have created contains different attributes like residence, location, website, etc. We can use this dataset according to our needs. 

We have stored these profiles into a data frame so that we can perform operations on it, like Visualization, Analysis, etc.

Conclusion

In this article, we saw how we can use Faker, an open-source python library to generate fake data and how we can create a fake dataset containing profiles of different fake people in different languages, locations, etc. The dataset created can be used for different purposes like training a machine learning model, performing different operations, etc. 

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.