Advertisement

Pandas 2.0 is Finally Here!

The new version of Pandas has added the ability to use any numpy numeric dtype in an Index, and removed Int64Index, UInt64Index, and Float64Index.
Pandas 2.0
Listen to this story

The most awaited Pandas 2.0 is finally here. The new updates come with new features, bug fixes, and improved performance, alongside breaking changes. Close to 253 people have contributed patches to this release.

Check out the GitHub repository here

The release note stated that the users with existing code need to upgrade to pandas 1.5.3 before they upgrade to the second version of Pandas and make sure their code does not generate FutureWarning or DeprecationWarning messages. The release is said to be made available on conda-forge and PyPI

What’s new? 

There have been significant improvements compared to previous versions: 

Improved Performance 

The new version of Pandas has added the ability to use any numpy numeric dtype in an Index, and removed Int64Index, UInt64Index, and Float64Index. Also, the operations that previously forced the creation of 64-bit indexes can now create indexes with lower-bit sizes, such as 32-bit indexes. 

The ability for Index to hold numpy numeric dtypes has brought some changes in Pandas functionality. Now, instantiating using a numpy numeric array follows the dtype of the numpy array. 

Significant behaviour changes 

The bug fixes in the latest version of panda have bought some notable behaviour changes. For instance, the DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() methods now overflow instead of casting to float when the result can be held by int64 dtype. This makes sure that the results are correct and consistent with numpy and the regular DataFrame.cumprod() and DataFrame.cumsum() methods when the limit of int64 is reached.

Further, SeriesGroupBy.nth() and DataFrameGroupBy.nth() methods now behave as filtrations instead of aggregations. In other words, they may return either zero or multiple rows per group, and the index of the result is derived from the input by selecting the appropriate rows. Say, when n is larger than the group, no rows instead of NaN is returned. 

The release not stated that these changes may have notable behaviour changes, so it is important to be aware of them when upgrading to Pandas 2.0. 

Read: Comprehensive Guide To Pandas Dataframes with Python Codes

There is more

The new version of Pandas also involves unsupported datetime and timedelta data types. For instance, in the previous versions, Pandas would replace unsupported data types with nanoseconds data types silently. But, in the new version, Pandas is said to support only “s”, “ms”, “us”, and “ns” resolutions, and it now raises an error instead of silently replacing unsupported data types with a supported one. 

In addition to this, Pandas 2.0 has made changes related to the result name and index of the Series.value_counts() method. For example, in the previous versions, the resulting name and index were the same as the original object. This used to cause a lot of confusion when resetting the index. In the new version, the result name willl be ‘count’ (or ‘proportion’ if normalise=True was passed), and the index will be named after the original object. 

In Pandas 2.0, the pandas disallow astype conversion to non-supported datetime64/timedelta64 data types, and it raises an error. In comparison, in the previous versions, when converting a Series or DataFrame from datetime64[ns] to a different datetime64[X] dtype, Pandas would return with datetime64[ns] dtype instead of the requested dtype

For more details on the latest version of Pandas, click here

Download our Mobile App

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR