Hands-On Guide to Vaex – Tool to Overcome Drawbacks of Pandas

To overcome these drawbacks of Pandas, let us explore a high-performance python library for lazy Out-of-Core Dataframes named Vaex which is used to visualize and manipulate big tabular datasets.

Pandas is an open-source data analysis and manipulation tool built on python. It is generally used for manipulating numerical and time-series data. It is used to create data structures like a data frame. Pandas is one of the most used python libraries but it has certain drawbacks like it uses a slow function which is not very suitable for bigger datasets, also pandas only handle results that fit in the memory which can be easily filled.

To overcome these drawbacks of Pandas, let us explore a high-performance python library for lazy Out-of-Core Dataframes named Vaex which is used to visualize and manipulate big tabular datasets. It performs different statistical functions and visualizations on very large datasets within seconds. Vaex in python uses Lazy computation and Memory mapping in which no memory is wasted. It loads a dataset with billions of rows in a few seconds. 

In this article we will explore:

  1. How to use Vaex in python?
  2. Visualization using Vaex?
  3. Comparing Vaex and Pandas  

Implementation of Vaex in Python

We will start exploring vaex but before to that, we need to install it using pip install vaex

  1. Importing libraries

We will import both pandas and vaex library as we need to compare the performance of both. 

import vaex

import pandas as pd

  1. Using Vaex

We will explore how to load a dataset in vaex and perform different operations on it. The dataset we are using here is of NYC Motor Vehicle Collision which is of around 350 MB. We will load this dataset using vaex.

df =‘motor_nyc.csv’)


Basic Operations on the dataset:




df['CONTRIBUTING FACTOR VEHICLE 1'].value_counts()



df['CROSS STREET NAME'].count()

  1. Visualization with Vaex

Now we will visualize some of the plots using the data frame loaded using vaex and note the time using ‘%%time’







Here we can see that despite the dataset being large in size Vaex in python did not take much time to create the plots.

Similarly, we can also create several other plots and note the time taken by Vaex as very less compared to pandas. 

Vaex has several other features like:

  1. It can read data from a large number of sources like cs, hdf5, astropy table, etc.
  2. It supports all major types of visualization like heatmaps, scatter plots, etc.
  3. It supports all statistical functions like variance, co-variance, etc. 
  4. It is blazingly fast as it works on lazy computing and zero memory copying policy.

d. Vaex V/s Pandas

Now let’s compare the time taken by pandas and vaex for different operations.

  1. Loading the same dataset

#Using Pandas


df = pd.read_csv('motor_nyc.csv')

#Using Vaex


df1 ='motor_nyc..csv')


Here we can see that while pandas took 18 seconds Vaex loaded the same dataset in 23 milliseconds.

  1. Performing Statistical analysis

#Using Pandas



print(df['NUMBER OF PEDESTRIANS KILLED'].value_counts())

#Using Vaex



print(df1['NUMBER OF PEDESTRIANS KILLED'].value_counts())


Here also we can see that vaex is incredibly faster than pandas.

Similarly, we can try different operations using both pandas and vaex to find out that Vaex is faster than pandas.  


In this article we discussed:

  1. How we can use Vaex in python for larger datasets.
  2. Visualization using Vaex dataset 
  3. We compared pandas and vaex to find out that Vaex is pretty much faster than pandas. 

Download our Mobile App

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox