Analyzing Climate Change Using Earth Surface Temperature DataSet

In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

With each passing day, the threat upon climate change has become an important matter to be concerned about. Giving rise to global warming with the emission of greenhouse gases and drastic weather changes. Greenhouse gases mostly due to the rise in Carbon Dioxide emission and methane. The sources being fossil fuels being burnt, deforestation and industrial effluents. Over recent years there has been a massive increase in Earth’s surface temperature with heat waves rising. Simultaneously glaciers are melting, thereby decreasing land size. Not only humans but also plants, animal kingdom are being affected rigorously.

Scientists say this will continue to destroy mother Earth if something is not done at its earliest. Every big organisation is now joining hands in making decisions regarding the betterment of climate changes for our future generations. WHO and NASA have brought about many regulations in this climate change index for all the countries.

Source: Wikipedia

About the Dataset

The Berkeley Earth Surface Temperature Study contains 1.6 billion temperature records. It is very well packaged and has interesting subsets (like countries, cities, etc.). They have published the source data for the transformations. They have included methods that have weather observations from a short timespan to be included. In this dataset, there are several files. Global Land and Ocean-and-Land Temperatures record from 1750 – 2015.

Other files include – Global Average Land Temperature record for Country, Global Average Land Temperature record for State, Global Land Temperatures record for Major City, Global Land Temperatures record for City.

Time Series

The raw data collected from Berkley Earth has been processed and cleaned by many developers and made into a proper dataset; thereby, researchers can work upon and bring more insights. Dataset Used – Link. We will be demonstrating time series analysis over this dataset.

# importing libraries

 import pandas as pd
 import seaborn as sns
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline
 from plotly.offline import download_plotlyjs, init_notebook_mode, iplot 

# read dataset

 temp = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv',parse_dates=["dt"], index_col="dt")
 DatetimeIndex(['1750-01-01', '1750-02-01', '1750-03-01', '1750-04-01',
                '1750-05-01', '1750-06-01', '1750-07-01', '1750-08-01',
                '1750-09-01', '1750-10-01',
                '2015-03-01', '2015-04-01', '2015-05-01', '2015-06-01',
                '2015-07-01', '2015-08-01', '2015-09-01', '2015-10-01',
                '2015-11-01', '2015-12-01'],
               dtype='datetime64[ns]', name='dt', length=3192, freq=None) 


 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 3192 entries, 1750-01-01 to 2015-12-01
 Data columns (total of eight columns):
  #   Column                                     Non-Null Count  Dtype  
 ---  ------                                     --------------  -----  
  0   LandAverageTemperature                     3180 non-null   float64
  1   LandAverageTemperatureUncertainty          3180 non-null   float64
  2   LandMaxTemperature                         1992 non-null   float64
  3   LandMaxTemperatureUncertainty              1992 non-null   float64
  4   LandMinTemperature                         1992 non-null   float64
  5   LandMinTemperatureUncertainty              1992 non-null   float64
  6   LandAndOceanAverageTemperature             1992 non-null   float64
  7   LandAndOceanAverageTemperatureUncertainty  1992 non-null   float64
 dtypes: float64(8)
 memory usage: 224.4 KB
 ((3192, 8), None) 

# generating heatmap



# visualisation for all the attributes

# Yearly Average Land Temperature

 new_df = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
 new_df['year'] = pd.to_datetime( new_df['dt']).dt.year 
 by_new = new_df.groupby(['year'] )['LandAverageTemperature'].mean().reset_index()
 new_pivot = by_new.pivot_table(values='LandAverageTemperature', index='year')

After 1900 temperature has a steep increase.

# highest temperate dates

 ax = temp.groupby(['dt'])['AverageTemperature'].last().sort_values(ascending=False).head(10).sort_values().plot(kind='barh');
 ax.set_xlabel("avg temp");
 plt.title("Date Wise Highest Average Temperature"); 

# Average Temperature in all Seasons

 ax.set_ylabel('Average temperature')
 ax.set_title('Average temperature in each season')
 legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1) 

# Countries with Highest temperature Differences

 temp_country = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalLandTemperaturesByCountry.csv')
 countries = temp_country['Country'].unique()
 for country in countries:
     curr_temps = temp_by_country[temp_by_country['Country'] == country]['AverageTemperature']
     max_min_list.append((curr_temps.max(), curr_temps.min()))
 diff, countries = (list(x) for x in zip(*sorted(zip(diff, countries), key=lambda pair: pair[0], reverse=True)))
 f, ax = plt.subplots(figsize=(8, 8))
 sns.barplot(x=diff[:15], y=countries[:15], palette=sns.color_palette("coolwarm", 25), ax=ax) 

For the complete notebook, visit the link here.


In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

Download our Mobile App

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it