In the field of data science, at the beginning of any sort of analysis or modelling, we tend to visualize the data so that we can have some insight from the problem domain. This can be achieved using various data visualization tools such as Matplotlib, Pandas, Plotly, Seaborn, Bokeh, etc. All these mentioned tools are widely used with tabular data. Similarly, to visualize geospatial data or to plot a map of any geographical location and show some of the interesting facts, we can leverage a Python library named GeoPandas. This library is exclusively used for plotting and manipulating geospatial data. In contrast to GeoPandas, we are going to discuss the following points in this article:
Table of Contents
- What is Spatial Data?
- Geometric Data
- Geographic Data
- Working with GeoPandas
- Visualizing Geographical Data using GeoPandas
Let’s proceed with the discussions one by one.
What is Spatial Data?
Spatial data refers to any type of data that directly or indirectly refers to a certain geographical area or location. Spatial data is a numerical representation of a physical item in a geographical coordinate system, often known as geospatial data or geographic information. Geographic data, on the other hand, is considerably more than just the geographical component of a map. Users can save geographical data in a number of formats because it can contain more than just location-specific information. We can learn more about how each variable affects people, communities, and populations by analyzing this data.
There are various types of spatial data, but the two most common are geometric data and geographic data. Let us try two understand these two kinds of data.
Geometric data is a sort of spatial data that is represented on a two-dimensional flat surface. For example, geometric data is used in floor plans. Google Maps is a navigation program that uses geometric data to generate precise directions. It is, in fact, one of the most basic examples of geographical data in action.
Geographic data is the information that has been plotted around a sphere. Most of the time, the sphere is the planet Earth. Geographic data emphasizes the relationship between latitude and longitude to a given object or area. A Global Positioning System (GPS) is a well-known example of geographic data.
Geospatial data often includes vast sets of spatial data gathered from a variety of sources in various forms and might contain information such as census data, satellite imagery, meteorological data, mobile phone data, drawn images, and social media data. When geospatial data can be discovered, shared, analyzed, and used in conjunction with traditional business data, it is the most useful.
Geospatial analytics adds time and location granularity to standard data sets. Maps, graphs, statistics, and cartograms can all be used to depict historical and current events in various ways. This additional information helps to paint a clearer picture of what transpired. Visual patterns and images that are easy to recognize reveal insights that could otherwise be lost in a huge spreadsheet. Forecasts can be generated more quickly, conveniently, and precisely with this technique.
Working with GeoPandas
The GeoDataFrame, which extends the Pandas DataFrame, is the main data structure of GeoPandas.
The GeoDataFrame can perform all of the underlying DataFrame operations. It includes one or more GeoSeries (which extend pandas Series), each with geometries in a different projection (GeoSeries.crs). Though a GeoDataFrame can have many GeoSeries columns, only one of them will be the active geometry, which means that all geometric operations will be performed on that column.
Visualizing Geographical Data using GeoPandas
Using the GeoPandas.read_file method, which recognizes the file type automatically and builds a GeoDataFrame, we can quickly read a file that contains both data and geometry (e.g., GeoPackage, GeoJSON, Shapefile).
Here in our case, we are using Indian district-level shapefiles. Shapefiles are nothing but another coded version of files that stores different attributes of Geospatial data in separate files. The data set we are using is taken from this Kaggle Repository, make sure to store all the files in the working directory.
Each item in a GeoSeries like (pandas series) is a set of shapes corresponding to a single observation. A single shape (such as a single polygon) can constitute an entry, or numerous shapes can be combined into a single observation (like the many polygons that make up the State of Maharashtra or a country India). GeoPandas has three basic classes of geometric objects (which are actually shapely objects):
- Points / Multi-Points
- Lines / Multi-Lines
- Polygons / Multi-Polygons
All objects have their own physical types, such as Point for a structure, Line, Polygon, and MultiPolygon for a country with several cities. Each of them can be used for a distinct form of physical item.
In this implementation, we are going to plot a Map of India and then the Maharashtra state separately with a satellite view as well.
! pip install GeoPandas ! pip install contextily import GeoPandas as gpd import contextily as ctx
Load our .shp file using the attribute read_file of GeoPandas.
geo_data = gpd.read_file('/content/drive/MyDrive/data/Geo/output.shp') geo_data.head(8)
We can use the GeoDataFrame.area attribute, which returns pandas series, to find out how much area each polygon (or MultiPolygon) has. Series. GeoDataFrame.area is nothing more than GeoSeries.area applied to a geometry column that is currently active.
geo_data['area'] = geo_data.area
Now simply we need to call geo_data.plot(‘area’) which will return the geography of India.
Similarly to plot the state, in the geo_data exclude all the states except you wanted one, here I’m going to plot a map of Maharashtra
geo_data = geo_data[geo_data['statename']=='Maharashtra'] geo_data.plot('area',figsize=(12,10))
Now we are going to plot a satellite view of the map. For that, we are using a python library named contextily that is designed to retrieve the map from the internet and further we require coordinates of the desired location which can be obtained by GeoPandas library.
The Coordinate Reference System (CRS) is available for each GeoSeries as GeoSeries.crs. With the help of the CRS, GeoPandas is able to determine the geometries’ coordinates anywhere on the globe. Geographic CRS refers to coordinates in latitude and longitude. Its CRS is WGS84, and the authorization code is EPSG:3857 in certain situations.
# getting the coordinates geo_data = geo_data.to_crs(epsg=3857) # plot the map ax = geo_data.plot(figsize=(10, 10), alpha=0.5, edgecolor='k') ctx.add_basemap(ax)
In this article, we have seen what spatial data is and its associated features. Later we have discussed the use case of GeoPandas built on pandas exclusively for handling the geospatial data. By reviewing the map from the internet using contextily and coordinates from GeoPandas we have nicely plotted the satellite view of our map.