In data science, exploratory data analysis(EDA) is one of the most important processes that need to be done before modeling or any other processes regarding data usage. There are various fields instead of data science like business analytics and graph representations where the EDA is required. This enables us to find insights and patterns present inside the data which we can’t understand by seeing the data in raw formats like CSV, excel and pandas data frames.
EDA itself is a combination of a lot of processes. We go through many processes like validating the data, finding out the null values present inside the data, making graphs using data to understand the patterns, finding dependency of the data, extracting features from the data etc. Hence, as we go deeper, we know details about the data, and it takes a lot of time to complete all the EDA processes. There are various libraries present in python to complete those tasks. Also, several tools like tableau, Microsoft BI etc., are available to visualize the data, which helps us in EDA.
Sign up for your weekly dose of what's up in emerging technology.
Mito is an open-source python library. For creating interactive dashboards and graphs with the help of python-like matplotlib, seaborn, plotly, geoplotlib etc. using Mito, we can also complete most of the steps of our EDA. But the thing which makes it different from all these libraries is it helps us to go through EDA faster without having so much knowledge about the coding, unlike some other libraries. So in this article, we are going to explore the features and functionalities of Mito.
Implementations using Mito
Let’s start with the installation of the library in jupyter lab.
Before installing Mito, we need to satisfy the requirements of the library. The only requirement of the library is Python 3.6 or above.
We can check the version from the command prompt or anaconda prompt using the following command.
If you have the above version, you don’t have to worry about the requirement. And if you are having lower than python 3.6, you can upgrade it by using the following command.
conda update python
We can download the Mito installer using the following command in the command prompt.
python -m pip install mitoinstaller
Now we can install the Mito library using the command prompt and the Mito installer which we had downloaded from this command:
python -m mitoinstaller install
Finally, we can launch the jupyter lab:
python -m jupyter lab
If there is any other jupyterlab notebook that is already open, you need to refresh or restart the notebook. After launching the jupyterlab we are ready to make our new mitosheet. Also, Mito provides its own free hosted version to practice and make visualisations of Mito where data sets like airport-pets and Zipcode-data are already present. You can click on this link for the Mito free hosted version.
In this article, we are using Mito locally.
After the installation, we can start the Mito sheet by using this command.
# Run command to create a Mito sheet import mitosheet mitosheet.sheet()
We can directly import our data using the import button present in the upper left corner on the second position.
For basic graph work, I am importing the alcohol dataset, a time series dataset and having the sales value of the data with their date value.
For visualisation, we need to change the Date value to the datetime value. Then, as we click on the column, we will have the hover on the right side of the window to select the data type we want to proceed with.
From here, we can select our required data type.
Above we have got some options shown in the image below.
Using the graph option.
After selecting the graph option on the right side of the screen, we will have this option from where we can select the chart type, and by providing the access, we can easily make a graph faster and interactive.
Here I have generated a scatter plot of our time series, and we can see the visualization of our data; also, if you are using IoT on your computer and generating the graph, the graph will be a live graph where you can see the value for a particular value by moving your cursor. Using Mito we can also perform most of the functionality similar to the excel and google sheets. Next, I am performing the addition function in the UNITS column where I am adding 1 to every UNIT value.
Here we can see how simple the task was. I just need to click on any data point and define the task. We can also pass the data using the pandas data frame.
import pandas as pd # initialize list of lists data = [['tom', 10], ['nick', 15], ['juli', 14]] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Age']) mitosheet.sheet(df)
Also, we can see that we can pivot, merge and download and save the data from the reports using Mito. Here you can see all the functionality and formulas supported by Mito. Also, you can see the functionality and formula from the doc option given in the right corner of the page.
Here we can see how easy it becomes to perform EDA with the Mito package. Also, if we want to learn or see the code part in the next code cell, you will see all the codes for every step you performed.
We can know how we performed all these steps using pandas and the NumPy library from this cell. Also, we can copy paste and perform them separately for manipulating and visualizing the data using python.
Mito claims that using it; we can perform Easy, Powerful & Flexible EDA where it reduces the coding part from the process and also we can also repeat the process after saving the file. In terms of power, it provides the graphs with such a high speed and accuracy, which is also a factor in saving time with low amounts of effort. Furthermore, this environment provides the flexibility to work in any environment. For example, you are performing analysis in Power BI. In that case, you always need to open that application and perform the analysis, but here with the Mito environment, we have codes for every step. So in the process of modeling it will help focus more on modeling instead of EDA because we will just need to acquire the codes and put them before the fitting of the model.