Hands-On Guide to Vaex – Tool to Overcome Drawbacks of Pandas

To overcome these drawbacks of Pandas, let us explore a high-performance python library for lazy Out-of-Core Dataframes named Vaex which is used to visualize and manipulate big tabular datasets.

Pandas is an open-source data analysis and manipulation tool built on python. It is generally used for manipulating numerical and time-series data. It is used to create data structures like a data frame. Pandas is one of the most used python libraries but it has certain drawbacks like it uses a slow function which is not very suitable for bigger datasets, also pandas only handle results that fit in the memory which can be easily filled.

To overcome these drawbacks of Pandas, let us explore a high-performance python library for lazy Out-of-Core Dataframes named Vaex which is used to visualize and manipulate big tabular datasets. It performs different statistical functions and visualizations on very large datasets within seconds. Vaex in python uses Lazy computation and Memory mapping in which no memory is wasted. It loads a dataset with billions of rows in a few seconds. 

In this article we will explore:


Sign up for your weekly dose of what's up in emerging technology.
  1. How to use Vaex in python?
  2. Visualization using Vaex?
  3. Comparing Vaex and Pandas  

Implementation of Vaex in Python

We will start exploring vaex but before to that, we need to install it using pip install vaex

  1. Importing libraries

We will import both pandas and vaex library as we need to compare the performance of both. 

import vaex

import pandas as pd

  1. Using Vaex

We will explore how to load a dataset in vaex and perform different operations on it. The dataset we are using here is of NYC Motor Vehicle Collision which is of around 350 MB. We will load this dataset using vaex.

df = vaex.open(‘motor_nyc.csv’)


Basic Operations on the dataset:




df['CONTRIBUTING FACTOR VEHICLE 1'].value_counts()



df['CROSS STREET NAME'].count()

  1. Visualization with Vaex

Now we will visualize some of the plots using the data frame loaded using vaex and note the time using ‘%%time’







Here we can see that despite the dataset being large in size Vaex in python did not take much time to create the plots.

Similarly, we can also create several other plots and note the time taken by Vaex as very less compared to pandas. 

Vaex has several other features like:

  1. It can read data from a large number of sources like cs, hdf5, astropy table, etc.
  2. It supports all major types of visualization like heatmaps, scatter plots, etc.
  3. It supports all statistical functions like variance, co-variance, etc. 
  4. It is blazingly fast as it works on lazy computing and zero memory copying policy.

d. Vaex V/s Pandas

Now let’s compare the time taken by pandas and vaex for different operations.

  1. Loading the same dataset

#Using Pandas


df = pd.read_csv('motor_nyc.csv')

#Using Vaex


df1 = vaex.open('motor_nyc..csv')


Here we can see that while pandas took 18 seconds Vaex loaded the same dataset in 23 milliseconds.

  1. Performing Statistical analysis

#Using Pandas



print(df['NUMBER OF PEDESTRIANS KILLED'].value_counts())

#Using Vaex



print(df1['NUMBER OF PEDESTRIANS KILLED'].value_counts())


Here also we can see that vaex is incredibly faster than pandas.

Similarly, we can try different operations using both pandas and vaex to find out that Vaex is faster than pandas.  


In this article we discussed:

  1. How we can use Vaex in python for larger datasets.
  2. Visualization using Vaex dataset 
  3. We compared pandas and vaex to find out that Vaex is pretty much faster than pandas. 

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM