With each passing day, the threat upon climate change has become an important matter to be concerned about. Giving rise to global warming with the emission of greenhouse gases and drastic weather changes. Greenhouse gases mostly due to the rise in Carbon Dioxide emission and methane. The sources being fossil fuels being burnt, deforestation and industrial effluents. Over recent years there has been a massive increase in Earth’s surface temperature with heat waves rising. Simultaneously glaciers are melting, thereby decreasing land size. Not only humans but also plants, animal kingdom are being affected rigorously.
Scientists say this will continue to destroy mother Earth if something is not done at its earliest. Every big organisation is now joining hands in making decisions regarding the betterment of climate changes for our future generations. WHO and NASA have brought about many regulations in this climate change index for all the countries.
Source: Wikipedia
About the Dataset
The Berkeley Earth Surface Temperature Study contains 1.6 billion temperature records. It is very well packaged and has interesting subsets (like countries, cities, etc.). They have published the source data for the transformations. They have included methods that have weather observations from a short timespan to be included. In this dataset, there are several files. Global Land and Ocean-and-Land Temperatures record from 1750 – 2015.
Other files include – Global Average Land Temperature record for Country, Global Average Land Temperature record for State, Global Land Temperatures record for Major City, Global Land Temperatures record for City.
Time Series
The raw data collected from Berkley Earth has been processed and cleaned by many developers and made into a proper dataset; thereby, researchers can work upon and bring more insights. Dataset Used – Link. We will be demonstrating time series analysis over this dataset.
# importing libraries
import pandas as pd import seaborn as sns import numpy as np import matplotlib.pyplot as plt %matplotlib inline from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
# read dataset
temp = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv',parse_dates=["dt"], index_col="dt") temp.index
DatetimeIndex(['1750-01-01', '1750-02-01', '1750-03-01', '1750-04-01', '1750-05-01', '1750-06-01', '1750-07-01', '1750-08-01', '1750-09-01', '1750-10-01', ... '2015-03-01', '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01', '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01', '2015-12-01'], dtype='datetime64[ns]', name='dt', length=3192, freq=None)
temp.shape, temp.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3192 entries, 1750-01-01 to 2015-12-01 Data columns (total of eight columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 LandAverageTemperature 3180 non-null float64 1 LandAverageTemperatureUncertainty 3180 non-null float64 2 LandMaxTemperature 1992 non-null float64 3 LandMaxTemperatureUncertainty 1992 non-null float64 4 LandMinTemperature 1992 non-null float64 5 LandMinTemperatureUncertainty 1992 non-null float64 6 LandAndOceanAverageTemperature 1992 non-null float64 7 LandAndOceanAverageTemperatureUncertainty 1992 non-null float64 dtypes: float64(8) memory usage: 224.4 KB ((3192, 8), None)
# generating heatmap
sns.heatmap(temp.corr())
Trends
# visualisation for all the attributes
# Yearly Average Land Temperature
new_df = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv') new_df['year'] = pd.to_datetime( new_df['dt']).dt.year by_new = new_df.groupby(['year'] )['LandAverageTemperature'].mean().reset_index() new_pivot = by_new.pivot_table(values='LandAverageTemperature', index='year') new_pivot.iplot(kind='scatter')
After 1900 temperature has a steep increase.
# highest temperate dates
ax = temp.groupby(['dt'])['AverageTemperature'].last().sort_values(ascending=False).head(10).sort_values().plot(kind='barh'); ax.set_xlabel("avg temp"); plt.title("Date Wise Highest Average Temperature");
# Average Temperature in all Seasons
ax.set_ylabel('Average temperature') ax.set_xlabel('Year') ax.set_title('Average temperature in each season') legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1)
# Countries with Highest temperature Differences
temp_country = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalLandTemperaturesByCountry.csv') countries = temp_country['Country'].unique() for country in countries: curr_temps = temp_by_country[temp_by_country['Country'] == country]['AverageTemperature'] max_min_list.append((curr_temps.max(), curr_temps.min())) diff, countries = (list(x) for x in zip(*sorted(zip(diff, countries), key=lambda pair: pair[0], reverse=True))) f, ax = plt.subplots(figsize=(8, 8)) sns.barplot(x=diff[:15], y=countries[:15], palette=sns.color_palette("coolwarm", 25), ax=ax)
For the complete notebook, visit the link here.
Conclusion
In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.