Analyzing Climate Change Using Earth Surface Temperature DataSet

In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

With each passing day, the threat upon climate change has become an important matter to be concerned about. Giving rise to global warming with the emission of greenhouse gases and drastic weather changes. Greenhouse gases mostly due to the rise in Carbon Dioxide emission and methane. The sources being fossil fuels being burnt, deforestation and industrial effluents. Over recent years there has been a massive increase in Earth’s surface temperature with heat waves rising. Simultaneously glaciers are melting, thereby decreasing land size. Not only humans but also plants, animal kingdom are being affected rigorously.

Scientists say this will continue to destroy mother Earth if something is not done at its earliest. Every big organisation is now joining hands in making decisions regarding the betterment of climate changes for our future generations. WHO and NASA have brought about many regulations in this climate change index for all the countries.

Source: Wikipedia

About the Dataset

The Berkeley Earth Surface Temperature Study contains 1.6 billion temperature records. It is very well packaged and has interesting subsets (like countries, cities, etc.). They have published the source data for the transformations. They have included methods that have weather observations from a short timespan to be included. In this dataset, there are several files. Global Land and Ocean-and-Land Temperatures record from 1750 – 2015.

Other files include – Global Average Land Temperature record for Country, Global Average Land Temperature record for State, Global Land Temperatures record for Major City, Global Land Temperatures record for City.

Time Series

The raw data collected from Berkley Earth has been processed and cleaned by many developers and made into a proper dataset; thereby, researchers can work upon and bring more insights. Dataset Used – Link. We will be demonstrating time series analysis over this dataset.

# importing libraries

 import pandas as pd
 import seaborn as sns
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline
 from plotly.offline import download_plotlyjs, init_notebook_mode, iplot 

# read dataset

 temp = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv',parse_dates=["dt"], index_col="dt")
 DatetimeIndex(['1750-01-01', '1750-02-01', '1750-03-01', '1750-04-01',
                '1750-05-01', '1750-06-01', '1750-07-01', '1750-08-01',
                '1750-09-01', '1750-10-01',
                '2015-03-01', '2015-04-01', '2015-05-01', '2015-06-01',
                '2015-07-01', '2015-08-01', '2015-09-01', '2015-10-01',
                '2015-11-01', '2015-12-01'],
               dtype='datetime64[ns]', name='dt', length=3192, freq=None) 


 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 3192 entries, 1750-01-01 to 2015-12-01
 Data columns (total of eight columns):
  #   Column                                     Non-Null Count  Dtype  
 ---  ------                                     --------------  -----  
  0   LandAverageTemperature                     3180 non-null   float64
  1   LandAverageTemperatureUncertainty          3180 non-null   float64
  2   LandMaxTemperature                         1992 non-null   float64
  3   LandMaxTemperatureUncertainty              1992 non-null   float64
  4   LandMinTemperature                         1992 non-null   float64
  5   LandMinTemperatureUncertainty              1992 non-null   float64
  6   LandAndOceanAverageTemperature             1992 non-null   float64
  7   LandAndOceanAverageTemperatureUncertainty  1992 non-null   float64
 dtypes: float64(8)
 memory usage: 224.4 KB
 ((3192, 8), None) 

# generating heatmap



# visualisation for all the attributes

# Yearly Average Land Temperature

 new_df = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
 new_df['year'] = pd.to_datetime( new_df['dt']).dt.year 
 by_new = new_df.groupby(['year'] )['LandAverageTemperature'].mean().reset_index()
 new_pivot = by_new.pivot_table(values='LandAverageTemperature', index='year')

After 1900 temperature has a steep increase.

# highest temperate dates

 ax = temp.groupby(['dt'])['AverageTemperature'].last().sort_values(ascending=False).head(10).sort_values().plot(kind='barh');
 ax.set_xlabel("avg temp");
 plt.title("Date Wise Highest Average Temperature"); 

# Average Temperature in all Seasons

 ax.set_ylabel('Average temperature')
 ax.set_title('Average temperature in each season')
 legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1) 

# Countries with Highest temperature Differences

 temp_country = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalLandTemperaturesByCountry.csv')
 countries = temp_country['Country'].unique()
 for country in countries:
     curr_temps = temp_by_country[temp_by_country['Country'] == country]['AverageTemperature']
     max_min_list.append((curr_temps.max(), curr_temps.min()))
 diff, countries = (list(x) for x in zip(*sorted(zip(diff, countries), key=lambda pair: pair[0], reverse=True)))
 f, ax = plt.subplots(figsize=(8, 8))
 sns.barplot(x=diff[:15], y=countries[:15], palette=sns.color_palette("coolwarm", 25), ax=ax) 

For the complete notebook, visit the link here.


In this article, we’ve shown some of the time series analysis trends done to the climate change dataset over the 265 years (1750-2015). Many insights can be drawn from this and can be used for analysis tallying with other similar kinds of data.

Download our Mobile App

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.