Listen to this story
The most awaited Pandas 2.0 is finally here. The new updates come with new features, bug fixes, and improved performance, alongside breaking changes. Close to 253 people have contributed patches to this release.
Check out the GitHub repository here.
The release note stated that the users with existing code need to upgrade to pandas 1.5.3 before they upgrade to the second version of Pandas and make sure their code does not generate FutureWarning or DeprecationWarning messages. The release is said to be made available on conda-forge and PyPI.
There have been significant improvements compared to previous versions:
The new version of Pandas has added the ability to use any numpy numeric dtype in an Index, and removed Int64Index, UInt64Index, and Float64Index. Also, the operations that previously forced the creation of 64-bit indexes can now create indexes with lower-bit sizes, such as 32-bit indexes.
The ability for Index to hold numpy numeric dtypes has brought some changes in Pandas functionality. Now, instantiating using a numpy numeric array follows the dtype of the numpy array.
Significant behaviour changes
The bug fixes in the latest version of panda have bought some notable behaviour changes. For instance, the DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() methods now overflow instead of casting to float when the result can be held by int64 dtype. This makes sure that the results are correct and consistent with numpy and the regular DataFrame.cumprod() and DataFrame.cumsum() methods when the limit of int64 is reached.
Further, SeriesGroupBy.nth() and DataFrameGroupBy.nth() methods now behave as filtrations instead of aggregations. In other words, they may return either zero or multiple rows per group, and the index of the result is derived from the input by selecting the appropriate rows. Say, when n is larger than the group, no rows instead of NaN is returned.
The release not stated that these changes may have notable behaviour changes, so it is important to be aware of them when upgrading to Pandas 2.0.
Read: Comprehensive Guide To Pandas Dataframes with Python Codes
There is more
The new version of Pandas also involves unsupported datetime and timedelta data types. For instance, in the previous versions, Pandas would replace unsupported data types with nanoseconds data types silently. But, in the new version, Pandas is said to support only “s”, “ms”, “us”, and “ns” resolutions, and it now raises an error instead of silently replacing unsupported data types with a supported one.
In addition to this, Pandas 2.0 has made changes related to the result name and index of the Series.value_counts() method. For example, in the previous versions, the resulting name and index were the same as the original object. This used to cause a lot of confusion when resetting the index. In the new version, the result name willl be ‘count’ (or ‘proportion’ if normalise=True was passed), and the index will be named after the original object.
In Pandas 2.0, the pandas disallow astype conversion to non-supported datetime64/timedelta64 data types, and it raises an error. In comparison, in the previous versions, when converting a Series or DataFrame from datetime64[ns] to a different datetime64[X] dtype, Pandas would return with datetime64[ns] dtype instead of the requested dtype.
For more details on the latest version of Pandas, click here.