Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.
Faker data can be used to tune machine learning models, for stress testing a model, etc. Depending upon your need you can generate data that best fits your demand. Faker data can also be used for learning purposes like performing different operations on different types of data types.
The datasets generated can also be used to tune the machine learning model, validate the model, and to test the model.
In this article we will:
- Explore Faker and its function
- Create our own Dataset
In order to explore faker we need to install it using pip install faker.
a. Importing useful libraries
We will explore different functions of faker so we will import faker also we will perform some operations on the dataset for which we need to import pandas.
from faker import Faker import pandas as pd
b. Using different functions
Now we will explore different functions that are there in the Faker library, for this, we need to initiate the Faker function using a variable.
exp = Faker()
Now we will use this variable to generate different attributes.
print('Name: ', exp.name()) print('Address: ',exp.address()) print('DOB: ',exp.date_of_birth())
We can generate information according to different regions and localities in different languages. We just need to mention the language we want. Let’s generate some data in the Japanese and Hindi language.
exp = Faker(['ja_JP', ‘hi_IN]) for i in range(5): print(exp.name())
We can also create our own sentences using the sentence function and text function.
We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.
words = ['Hello','Abhishek','all', 'are','where','why',] exp.sentence(ext_word_list=words)
Other than generating names and addresses, we can generate whole profiles for different persons that do not exist. We will use the profile function to generate a fake profile of a person.
Faker can also generate the random dataset.
c. Create a fake dataset using faker
Now we will use the profile function and generate a dataset that contains profiles of 100 unique people that are fake. For this, we will also use pandas to store these profiles into a data frame. We will create these profiles in the Hindi language.
exp = Faker(['hi_IN']) data = [exp.profile() for i in range(100)] df = pd.DataFrame(data) df
The dataset we have created contains different attributes like residence, location, website, etc. We can use this dataset according to our needs.
We have stored these profiles into a data frame so that we can perform operations on it, like Visualization, Analysis, etc.
In this article, we saw how we can use Faker, an open-source python library to generate fake data and how we can create a fake dataset containing profiles of different fake people in different languages, locations, etc. The dataset created can be used for different purposes like training a machine learning model, performing different operations, etc.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.