Last updated March 3, 2021
In AI Mysteries

How To Implement Drag And Drop Feature In Jupyter Notebook With Pivot Table

In this article, we will explore the different features of pivot table js and analyse the data in our dataset.

Share

Published on August 28, 2020

by Bhoomika Madhukar

Data visualization is a technique adopted by Data Analysts and scientists to communicate with the data in a graphical format. Doing this proves helpful in drawing meaningful insights from the data and understanding how the data is distributed. Along with visualization, data analysis is also an important aspect to bring order and some structure to the raw data. But the main advantage of these is to manage and make sense of the abundant dataset without having to go through each column of the dataset, line by line.

Pandas is a powerful tool when it comes to data analysis and data cleaning. Similarly, python provides a range of different libraries like matplotlib, seaborn, plotly and so on for visualization as well. Pivottablejs is a javascript pivot library that combines the functionality of data analysis from Pandas and visualization from several libraries into one.

In this article, we will explore the different features of pivot table js and analyse the data in our dataset.

Implementation of Pivottablejs

PivotTable is an open-source library javascript Pivot component. It is a completely customizable widget which may be altered to suit your needs. This component can be used to convert your jupyter notebook into a drag and drop tool for visualization and analysis of data. The different features of this are heatmap, row heatmap, column heatmap, table, charts, treemaps etc.

Let us look into how to implement this in our notebook.

This component can be downloaded as a python package using

pip install pivottablejs

Importing libraries

We will need to import pandas to read our CSV file and PivotTable to implement the drag and drop functionality.

import pandas as pd

from pivottablejs import pivot_ui

Loading the data

I have selected a waiter’s tips dataset. The features of this dataset are the gender of the waiter, time of service, whether the customer was a smoker or not and the target is to identify the tips that the waiter is getting. The dataset is available for download here. Load your data to the notebook.

tips=pd.read_csv(“tips.csv”)

tips.head()

Data visualization and analysis

To implement the drag and drop functionality all we have to do is run this command.

pivot_ui(tips)

There is an interactive screen that pops up on your screen that contains rows and columns of your dataset and other options.

You can get a count of how many rows contain information that might be important for you. Drag and drop the target which is tips and any feature you want. The display is automatic.

This displays the total count of the rows and columns of smoker and sex features vs the day. Clicking on the count box gives you a list of possible analysis that can be done on your dataset.

Let us explore some of these now. Since the aim is to predict the tip amount, it would be useful to look at the sum of the tips in comparison to the total bill amount. I will use the heatmap option for this.

The data indicate that a waitress would get higher tips on Sundays. There are different options available for plotting graphs like bar charts, line graphs and scatter plots. Here is what they look like.

Keeping the parameter for analysis as mean, that is the average tips a person gets in a day based on gender and smoking tables, here is what the bar chart looks like.

This indicates that women do get more tips than men do especially on the weekends.

Next, let us look at the analysis that tells us which time of the day the tips are least. Change the option to a minimum and the graph type to a line graph.

The graph shows that a male waiter gets least tips on Thursday dinners. Though line graphs are great in establishing relationships between features and target, area charts are more helpful in understanding data distribution as well.

Here is an area chart that tells us the median distribution of data during the day for the waiters.

All of this information is useful to help design a robust machine learning algorithm. There are multiple ways to explore the dataset using this widget and proves to be of much help because of its ease of use and the time consumed for analysis. You can also use statistical methods like median, standard deviation, lower and upper bounds for visualization of the data.

You can save each analysis to an HTML file for further use as well.

Conclusion

With only one line of code, we are able to perform data analysis and visualization in a very easy and efficient way using pivot tables js. This tool is helpful for obtaining gainful insights and extract meaning from the data with a few clicks. There is a lot to explore using this tool and the biggest advantage is the amount of time it saves for the visualization process.

Access all our open Survey & Awards Nomination forms in one place

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.