# Exploratory Data Analysis: Functions, Types & Tools

Exploratory data analysis (EDA) is a method of analysing and investigating the data sets to summarise their main characteristics

Discovered in the 1970s by American mathematician John Tukey, exploratory data analysis (EDA) is a method of analysing and investigating the data sets to summarise their main characteristics. Scientists often use data visualisation methods to discover patterns, spot anomalies, check assumptions or test a hypothesis through summary statistics and graphical representations.

EDA goes beyond the formal modelling or hypothesis to give maximum insight into the data set and its structure, and in identifying influential variables. It can also help in selecting the most suitable data analysis technique for a given project. Specific knowledge, such as the creation of a ranked list of relevant factors to be used as guidelines, can also be obtained using EDA.

### Types of EDA

The EDA types of techniques are either graphical or quantitative (non-graphical). While the graphical methods involve summarising the data in a diagrammatic or visual way, the quantitative method, on the other hand, involves the calculation of summary statistics. These two types of methods are further divided into univariate and multivariate methods.

#### THE BELAMY

Univariate methods consider one variable (data column) at a time, while multivariate methods consider two or more variables at a time to explore relationships. Thus, there are four types of EDA in all — univariate graphical, multivariate graphical, univariate non-graphical, and multivariate non-graphical. The graphical methods provide more subjective analysis, and quantitative methods are more objective.

• Univariate non-graphical: This is the simplest form of data analysis among the four options. In this type of analysis, the data that is being analysed consists of just a single variable. The main purpose of this analysis is to describe the data and to find patterns.
• Univariate graphical: Unlike the non-graphical method, the graphical method provides the full picture of the data. The three main methods of analysis under this type are histogram, stem and leaf plot, and box plots. The histogram represents the total count of cases for a range of values. Along with the data values, the stem and leaf plot shows the shape of the distribution. The box plots graphically depict a summary of minimum, first quartile median, third quartile, and maximum.
• Multivariate non-graphical: The multivariate non-graphical type of EDA generally depicts the relationship between multiple variables of data through cross-tabulation or statistics.
• Multivariate graphical: This type of EDA displays the relationship between two or more set of data. A bar chart, where each group represents a level of one of the variables and each bar within the group represents levels of other variables.

### EDA Tools

Python and R language are the two most commonly used data science tools to create an EDA.

Python: EDA can be done using python for identifying the missing value in a data set. Other functions that can be performed are — the description of data, handling outliers, getting insights through the plots. Its high-level, built-in data structure and dynamic typing and binding make it an attractive tool for EDA. Analyzing a dataset is a hectic task that takes a lot of time. Python provides certain open-source modules that can automate the whole process of EDA and help in saving time.

R: The R language is used widely by data scientists and statisticians for developing statistical observations and data analysis. R is an open-source programming language that provides a free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.

### Wrapping Up

Apart from the functions described above, EDA can also:

• Perform k-means clustering. It is an unsupervised learning algorithm where the data points are assigned to clusters, also known as k-groups. K-means clustering is commonly used in market segmentation, image compression, and pattern recognition.
• EDA can be used in predictive models such as linear regression, where it is used to predict outcomes.
• It is also used in univariate, bivariate, and multivariate visualization for summary statistics, establishing relationships between each variable, and for understanding how different fields in the data interact with each other.

## More Great AIM Stories

### TypeScript vs JavaScript: Who’s Winning The 10-year-long Battle?

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

## AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Kickstart your career in Data Science and Business Analytics with this program from Great Learning

The curriculum of the PGP in Data Science and Business Analytics: V.22 has been updated in consultation with industry experts, academicians and program alums.

### How to build healthcare predictive models using PyHealth?

PyHealth is a Python-based toolbox. As the name implies, this toolbox contains a variety of ML models and architecture algorithms for working with medical data and modeling.

### Explained: Prospective learning in AI

A paper published earlier this year argued that retrospective learning isn’t a good representation of true intelligence.

### AI in SEO is so evolved now, it’s pitting against itself

With the integration of AI into SEO, can brands overcome the strict and ever vigilant guidelines of SERPs?

### Council Post: Key things to remember while building data teams

The AI team consists of  ‘an external team’ (a team external to the data team but part of the core AI team) that works closely with the data team and then there is the core data team itself.

### IBM launches new Mainframe model, aims to regain lost ground

Despite the cost-saving benefits and ease of sharing resources, only 25% of enterprise workloads have been moved to the cloud.

### How AI is used for the early detection of breast cancer

CNNs are efficient in detecting malignancies from scans.

### Google says no to FLoC, replaces it with Topic

Topic is a Privacy Sandbox proposal for internet-based advertising, which is replacing FLoC (Federated Learning of Cohorts).

### Learning Scala 101: The best books, videos and courses

Tagged as “the definitive book on Scala”, this book is co-authored by Martin Odersky, the designer of the Scala language.

### Meta AI proposes a new approach to improve object detection

Detic, like ViLD, uses CLIP embeddings as the classifier.