Analyzing Climate Change Using Earth Surface Temperature DataSet

In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

With each passing day, the threat upon climate change has become an important matter to be concerned about. Giving rise to global warming with the emission of greenhouse gases and drastic weather changes. Greenhouse gases mostly due to the rise in Carbon Dioxide emission and methane. The sources being fossil fuels being burnt, deforestation and industrial effluents. Over recent years there has been a massive increase in Earth’s surface temperature with heat waves rising. Simultaneously glaciers are melting, thereby decreasing land size. Not only humans but also plants, animal kingdom are being affected rigorously.

Scientists say this will continue to destroy mother Earth if something is not done at its earliest. Every big organisation is now joining hands in making decisions regarding the betterment of climate changes for our future generations. WHO and NASA have brought about many regulations in this climate change index for all the countries.

Source: Wikipedia

About the Dataset

The Berkeley Earth Surface Temperature Study contains 1.6 billion temperature records. It is very well packaged and has interesting subsets (like countries, cities, etc.). They have published the source data for the transformations. They have included methods that have weather observations from a short timespan to be included. In this dataset, there are several files. Global Land and Ocean-and-Land Temperatures record from 1750 – 2015.

Other files include – Global Average Land Temperature record for Country, Global Average Land Temperature record for State, Global Land Temperatures record for Major City, Global Land Temperatures record for City.

Time Series

The raw data collected from Berkley Earth has been processed and cleaned by many developers and made into a proper dataset; thereby, researchers can work upon and bring more insights. Dataset Used – Link. We will be demonstrating time series analysis over this dataset.

# importing libraries

 import pandas as pd
 import seaborn as sns
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline
 from plotly.offline import download_plotlyjs, init_notebook_mode, iplot 

# read dataset

 temp = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv',parse_dates=["dt"], index_col="dt")
 DatetimeIndex(['1750-01-01', '1750-02-01', '1750-03-01', '1750-04-01',
                '1750-05-01', '1750-06-01', '1750-07-01', '1750-08-01',
                '1750-09-01', '1750-10-01',
                '2015-03-01', '2015-04-01', '2015-05-01', '2015-06-01',
                '2015-07-01', '2015-08-01', '2015-09-01', '2015-10-01',
                '2015-11-01', '2015-12-01'],
               dtype='datetime64[ns]', name='dt', length=3192, freq=None) 


 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 3192 entries, 1750-01-01 to 2015-12-01
 Data columns (total of eight columns):
  #   Column                                     Non-Null Count  Dtype  
 ---  ------                                     --------------  -----  
  0   LandAverageTemperature                     3180 non-null   float64
  1   LandAverageTemperatureUncertainty          3180 non-null   float64
  2   LandMaxTemperature                         1992 non-null   float64
  3   LandMaxTemperatureUncertainty              1992 non-null   float64
  4   LandMinTemperature                         1992 non-null   float64
  5   LandMinTemperatureUncertainty              1992 non-null   float64
  6   LandAndOceanAverageTemperature             1992 non-null   float64
  7   LandAndOceanAverageTemperatureUncertainty  1992 non-null   float64
 dtypes: float64(8)
 memory usage: 224.4 KB
 ((3192, 8), None) 

# generating heatmap



# visualisation for all the attributes

# Yearly Average Land Temperature

 new_df = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
 new_df['year'] = pd.to_datetime( new_df['dt']).dt.year 
 by_new = new_df.groupby(['year'] )['LandAverageTemperature'].mean().reset_index()
 new_pivot = by_new.pivot_table(values='LandAverageTemperature', index='year')

After 1900 temperature has a steep increase.

# highest temperate dates

 ax = temp.groupby(['dt'])['AverageTemperature'].last().sort_values(ascending=False).head(10).sort_values().plot(kind='barh');
 ax.set_xlabel("avg temp");
 plt.title("Date Wise Highest Average Temperature"); 

# Average Temperature in all Seasons

 ax.set_ylabel('Average temperature')
 ax.set_title('Average temperature in each season')
 legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1) 

# Countries with Highest temperature Differences

 temp_country = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalLandTemperaturesByCountry.csv')
 countries = temp_country['Country'].unique()
 for country in countries:
     curr_temps = temp_by_country[temp_by_country['Country'] == country]['AverageTemperature']
     max_min_list.append((curr_temps.max(), curr_temps.min()))
 diff, countries = (list(x) for x in zip(*sorted(zip(diff, countries), key=lambda pair: pair[0], reverse=True)))
 f, ax = plt.subplots(figsize=(8, 8))
 sns.barplot(x=diff[:15], y=countries[:15], palette=sns.color_palette("coolwarm", 25), ax=ax) 

For the complete notebook, visit the link here.


In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
Complete Guide To SARIMAX in Python for Time Series Modeling

SARIMAX(Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors) is an updated version of the ARIMA model. we can say SARIMAX is a seasonal equivalent model like SARIMA and Auto ARIMA. it can also deal with external effects. This feature of the model differs from other models

Yugesh Verma
Guide To AC and PAC Plots In Time Series

when we talk about the time-series data, many factors affect the time series, but the only thing that affects the lagged version of the variable is the time series data itself

Yugesh Verma
General Overview Of Time Series Data Analysis

In time-series data analysis, we seek the reason behind the changes occurring over time in time series, information points are gathered at adjacent time-spaces, there is a relation between observations, whether they can be proportional or unproportioned.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM