Data Scientist @ RMOTR

# Which UK Town Feels the Most Earthquakes?

Last updated: May 21st, 2019

# Which UK town feels the most earthquakes?¶

The short answer is Manchester tied with Fort William. To see the top 20, skip on to the results section. Otherwise, read on to learn about Pandas, GeoPandas and how you can do GIS with Python without needing a desktop GIS software (e.g. QGIS, ESRI ArcMap).

This Jupyter notebook demonstrates how the GeoPandas library can be used to perform GIS analysis using just Python code. In this case, we calculate the UK town that feels the most earthquakes. It is a simple example using real-world data from the British Geological Survey's earthquake catalogue. To run this for yourself and experiment with changing variables, follow with instructions in the README.md file.

The notebook has three parts:

1. Introduction to Pandas.
2. Introduction to GeoPandas.
3. Example analysis.
In [1]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import pyproj
import os
from shapely.geometry import Point, Polygon
%matplotlib inline


# 1. Introduction to Pandas¶

Pandas adds dataframe structures to Python and has been central to the rise of Python in the field of data science. This section shows some the features of Pandas for handling tables of data.

In [2]:
# View the structure of the input data

yyyy-mm-dd, hh:mm:ss.ss, lat, lon, depth, ML, Nsta, RMS, intensity, locality
1703-12-28, 17:03:00.0, 53.620, -0.110, , 4.2, , , 5-6 ," HULL "
1726-10-26, 07:00:00.0, 50.660, -2.470, , 3.3, , , 5 ," DORCHESTER "
1727-07-19, 04:00:00.0, 51.570, -3.760, 25.0, 5.2, , , 7 ," SWANSEA "
1728-03-01, 04:30:00.0, 55.580, -2.810, 21.0, 4.2, , , 4 ," GALASHIELS "
1734-10-25, 03:50:00.0, 50.200, -0.700, 14.0, 4.5, , , 5 ," PORTSMOUTH "
1736-04-30, 11:00:00.0, 56.170, -3.690, , 2.7, , , 6 ," OCHIL HILLS "
1744-02-05, 00:00:00.0, 52.950, -3.510, , 2.9, , , 5 ," MERIONETH "
1747-07-01, 22:30:00.0, 50.960, -3.110, , 3.5, , , 5 ," TAUNTON "
1749-02-14, 08:30:00.0, 55.420, -3.770, , 2.7, , , 5-6 ," LEADHILLS "


In [3]:
# Load into Pandas dataframe, specifying that two columns represent date and time
skipinitialspace=True,
usecols=['yyyy-mm-dd', 'hh:mm:ss.ss', 'lat', 'lon', 'depth', 'ML', 'locality'],
parse_dates=[['yyyy-mm-dd', 'hh:mm:ss.ss']], infer_datetime_format=True)

# Fix datetime column name and typo in locality name
earthquakes.rename(columns={'yyyy-mm-dd_hh:mm:ss.ss': 'datetime'}, inplace=True)
earthquakes.set_index('datetime', inplace=True)

# Check data within data frame

Out[3]:
lat lon depth ML locality
datetime
1703-12-28 17:03:00 53.62 -0.11 NaN 4.2 HULL
1726-10-26 07:00:00 50.66 -2.47 NaN 3.3 DORCHESTER
1727-07-19 04:00:00 51.57 -3.76 25.0 5.2 SWANSEA
1728-03-01 04:30:00 55.58 -2.81 21.0 4.2 GALASHIELS
1734-10-25 03:50:00 50.20 -0.70 14.0 4.5 PORTSMOUTH

## Explore the data¶

The following analysis uses plain Pandas, without real geospatial analysis.

#### Range of magnitudes¶

describe function gives summary statistics on columns

In [4]:
print(earthquakes[['ML', 'depth', 'lat']].describe())

                 ML        depth           lat
count  10638.000000  9365.000000  10916.000000
mean       1.281303     6.765884     54.968459
std        0.938884     6.116607      2.831757
min       -1.900000     0.000000     49.009000
25%        0.700000     2.100000     53.040000
50%        1.200000     5.100000     55.507500
75%        1.800000     9.300000     56.240000
max        6.100000    42.300000     63.000000


hist function plots histograms

In [5]:
ax = earthquakes['depth'].hist(bins=25, edgecolor='white')
txt = ax.set_title('Histogram of earthquake depth')


All of the earthquakes are less than 30 km deep, which is about the thickness of the Earth's crust. Shallower rocks are more brittle and more likely to produce earthquakes.

#### Ten largest quakes since 1700¶

sort_values sorts values. The clue is in the name.

In [6]:
earthquakes.sort_values('ML', ascending=False).head(10)

Out[6]:
lat lon depth ML locality
datetime
1931-06-07 00:25:00.000 54.08 1.50 23.0 6.1 NORTH SEA
1938-06-11 10:57:00.000 50.78 3.58 NaN 5.8 BELGIUM
1927-01-24 05:18:00.000 59.68 2.70 25.0 5.7 NORTH SEA
1977-04-06 19:32:04.000 61.50 3.00 NaN 5.6 NORTH SEA
1926-07-30 13:19:00.000 49.17 -1.62 18.0 5.5 CHANNEL ISLANDS
1984-07-19 06:56:13.600 52.96 -4.38 20.7 5.4 LLEYN PENIN,NW WALES
1927-02-17 23:17:00.000 49.17 -1.62 22.0 5.4 CHANNEL ISLANDS
1896-12-17 05:32:00.000 52.02 -2.55 20.0 5.3 HEREFORD
1852-11-09 04:25:00.000 53.02 -4.30 24.0 5.3 CAERNARVON
1957-02-11 15:43:00.000 52.82 -1.33 13.0 5.3 DERBY

The UK is not a very seismically active area. These earthquakes are certainly large enough to be felt and to cause damage, but across the whole planet there are at least 1,000 earthquakes of this size each year. The British Geological Survey has a page on Earthquake magnitude calculations.

#### Variation in rate of earthquakes with time¶

Pandas has excellent time-series processing capabilities. The resample function groups data into annual bins.

The number of earthquakes measured each year doesn't just depend on how often they occur; it also depends on how well they can be detected.

In [7]:
quakes_per_year = earthquakes.resample('1Y').size()
big_quakes_per_year = earthquakes[earthquakes['ML'] >= 3].resample('10Y').size()

fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
quakes_per_year.plot(ax=ax1, label='Earthquakes per year', marker='x')
ax1.legend()
ax1.set_ylabel('count')
big_quakes_per_year.plot(ax=ax2, label='Earthquakes >= M3 per year', marker='x')
ax2.legend()
txt = ax2.set_ylabel('count')


The big rise in recorded earthquakes around 1970 reflects the establishment a modern seismic monitoring network. Older earthquake data come from isolated seismometers or have been inferred from reports of shaking and damage.

#### Plot x, y data¶

The plot.scatter function is good for x,y data. In this case, there is no awareness that the data are spatial. Depth vs Magnitude can be plotted in the same way as Longitude vs Latitude.

In [8]:
ax = earthquakes.plot.scatter('depth', 'ML', alpha=0.5)

In [9]:
ax = earthquakes.plot.scatter('lon', 'lat', c='ML', alpha=0.5)


## 2. Introduction to Geopandas¶

GeoPandas extends Pandas to allow spatial operations on geometry objects. It uses the same spatial libraries as other open source GIS tools e.g. QGIS and PostGIS, so the results are the same as the analysis had been done in desktop GIS. Common GIS tasks are shown here.

### Load UK regions from GIS file¶

Geopandas uses Fiona, which in turn uses GDAL/OGR, which means that it can read and write most GIS formats. Here, data are read from a GeoPackage file. These are the new standard in open-source GIS and contain multiple layers.

First load UK administrative regions and populated places based on NaturalEarth data. This is a public domain dataset of physical and political GIS data. It is useful for adding context to maps. The attribute tables are accessible as Pandas-style dataframes.

In [10]:
# Load data form GIS file

# View attribute data for first 5 rows

Out[10]:
adm1_code diss_me iso_3166_2 wikipedia iso_a2 adm0_sr name name_alt name_local type ... sub_code gns_level gns_lang gns_adm1 gns_region min_label max_label min_zoom orig_ogc_fid geometry
0 GBR-2001 2001 GB-EAY GB 1 East Ayrshire Unitary District ... 1 ind UK35 10.0 11.0 10.0 1209 (POLYGON ((-4.253688319760101 55.6855418974406...
1 GBR-2002 2002 GB-EDU GB 1 East Dunbartonshire Unitary District ... 1 ind UK35 10.0 11.0 10.0 1210 (POLYGON ((-4.132532925354212 56.0041271018842...
2 GBR-2003 2003 GB-ERW GB 1 East Renfrewshire Unitary District ... 1 ind UK35 10.0 11.0 10.0 1211 (POLYGON ((-4.337869228582008 55.7929255227946...
3 GBR-2004 2004 GB-GLG GB 1 Glasgow Glasgow City Unitary District (city) ... 1 ind UK35 10.0 11.0 10.0 1212 (POLYGON ((-4.190513882309745 55.8887077912315...
4 GBR-2005 2005 GB-IVC GB 1 Inverclyde Unitary District ... 1 eng UK01 10.0 11.0 10.0 1213 (POLYGON ((-4.626825232208859 55.9297534239652...

5 rows × 62 columns

Spatial data are stored in the geometry column.

In [11]:
regions.geometry.head()

Out[11]:
0    (POLYGON ((-4.253688319760101 55.6855418974406...
1    (POLYGON ((-4.132532925354212 56.0041271018842...
2    (POLYGON ((-4.337869228582008 55.7929255227946...
3    (POLYGON ((-4.190513882309745 55.8887077912315...
4    (POLYGON ((-4.626825232208859 55.9297534239652...
Name: geometry, dtype: object

The plot function plots the map. Features can be coloured based on column data.

In [12]:
ax = regions.plot(figsize=(8, 8), column='name')
towns.plot(ax=ax, color='black')
txt = ax.set_title('UK regions and town')

# Draw arrow to highlight Rockall, which is too small to
# have been drawn on the map (see next section).
plt.annotate('Rockall', xy=(-13.687, 57.596), xytext=(-13, 58),

## Note that you can easily save plots to file in a variety of formats e.g.
# plt.savefig('map.pdf')
# plt.savefig('map.png', dpi=300)

Out[12]:
Text(-13,58,'Rockall')

### Perform geometric operations e.g. intersections¶

The map above is shifted to the east because of the tiny island of Rockall. We can trim this off using the intersection method.

In [13]:
area_of_interest = Polygon([(-9, 49), (-9, 62), (2, 62), (2, 49)])
regions['geometry'] = regions.intersection(area_of_interest)
ax = regions.plot(figsize=(8, 8), column='name')


### Perform spatial operations based on attribute data¶

We can select individual polygons based on their attributes, then manipulate their geometries. Vector data can be dissolved, which is the spatial equivalent of group_by. The geonunit attribute column contains country name. Here we combine all region polygons belonging to the same country into a single polygon.

In [14]:
countries = regions[['geonunit', 'geometry']].dissolve(by='geonunit')
countries['country'] = countries.index.values
ax = countries.plot(column='country')

# GeoDataframes have extra attributes based on their geometry
# e.g. centroids
print(countries.centroid)
countries.centroid.plot(ax=ax, marker='*', color='yellow')

geonunit
England             POINT (-1.458673232659381 52.59503047508206)
Northern Ireland    POINT (-6.693662132752302 54.60866474679423)
Scotland                POINT (-4.17031190751054 56.82895699654)
Wales                POINT (-3.75694970616271 52.33927041946923)
dtype: object

Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1e8aac98d0>

### Re-project data to Ordnance Survey coordinate system¶

Coordinate reference systems can be specified using EPSG codes. epsg:4326 is longitude and latitude as returned by GPS systems (using WGS84 datum). epsg:27700 is the Ordnance Survey for Great Britain. GeoPandas provides the to_crs method to set this.

In [15]:
countries_osgb = countries.to_crs({'init': 'epsg:27700'})
towns_osgb = towns.to_crs({'init': 'epsg:27700'})


Working in projected coordinates allows spatial calculations e.g. compare areas of countries of the UK in square kilometres...

In [16]:
1e-6 * countries_osgb.area

Out[16]:
geonunit
England             130372.965304
Northern Ireland     14273.430345
Scotland             78288.149889
Wales                20787.610406
dtype: float64

... or in the UK media's favourite measure of area comparison, the sizeofwales.

In [17]:
countries_osgb.area / countries_osgb.loc['Wales', 'geometry'].area

Out[17]:
geonunit
England             6.271667
Northern Ireland    0.686632
Scotland            3.766097
Wales               1.000000
dtype: float64

### Export to file¶

to_file writes data to any vector file format supported by OGR. Which is most of them. Here we write our new countries dataframe as a Shapefile.

In [18]:
output_file_name = os.path.join('data', 'countries.shp')
countries.to_file(output_file_name)


## 3. Example analysis¶

Here we find the most earthquake-prone town in the UK (or, to be precise, the regionally-important UK town with most earthquakes with magnitude > 3 within a 40 km radius since 1970).

### Convert existing DataFrame into GeoDataFrame¶

To do spatial operations, it is necessary to convert the lon and lat columns of our earthquakes dataframe into geometry objects. GeoPandas uses the Shapely library for this. Our coordinates will become a Point. We also have to set the coordinate reference system.

In [19]:
# Add geometry and convert to spatial dataframe in source CRS
earthquakes['coords'] = list(zip(earthquakes['lon'], earthquakes['lat']))
earthquakes['coords'] = earthquakes['coords'].apply(Point)
earthquakes = gpd.GeoDataFrame(earthquakes, geometry='coords', crs={'init': 'epsg:4326'})

# Reproject data in to Ordnance Survey GB coordinates
earthquakes_osgb = earthquakes.to_crs({'init': 'epsg:27700'})


Earlier we saw that the earthquake catalogue was only complete from 1970 onwards so we only want records from then onwards. We also take only earthquakes above magnitude 3 as these are the most likely to be felt.

In [20]:
# Extract larger quakes from when monitoring network was established
big_quakes_osgb = earthquakes_osgb[(earthquakes_osgb['ML'] >= 3)]['1970-01-01':]


### Plot earthquakes against UK outline¶

Data can now be plotted against the reference maps loaded from the Geopackage file. The largest earthquakes since 1970 have been highlighted on this map.

In [21]:
# Create the axis first
fig, ax = plt.subplots(figsize=(8, 8))

# Plot all earthquakes
earthquakes_osgb.sort_values('ML', ascending=True).plot(
ax=ax, column='ML', legend=True, alpha=0.5)
countries_osgb.plot(ax=ax, edgecolor='black', facecolor='none')

# Highlight the biggest earthquakes
big_quakes_osgb.plot(
ax=ax, color='red', marker='.', markersize=10)

# Make figure aspect ratio ensure that 10x10 km square appears square
ax.set_aspect('equal', 'box')


The earthquakes are not evenly spread. They are mainly associated with recent movement on ancient faults. Many are in the North Sea. The crust there was stretched during the Jurassic-Triassic period and produced the basins where oil-bearing rocks are found today. Earthquakes in the NW Highlands of Scotland are near the Great Glen Fault and Moine Thrust fault lines. These were formed during even older continental collisions.

### Find earthquakes within 40 km of towns¶

It is reasonable that a magnitude 3 earthquake will be felt 40 km away. We can define 40 km circles around each town using the buffer function.

In [22]:
towns_buffer = towns_osgb[['NAME', 'geometry']].copy()
towns_buffer['geometry'] = towns_osgb.buffer(40000)


To find which earthquakes could be felt in each town, we do a spatial join.

In [23]:
# Note that spatial joins requires rtree python package and libspatialindex-c4v5 Debian/Ubuntu package
quakes_by_town = gpd.sjoin(big_quakes_osgb, towns_buffer, op='within', how='left')
quakes_by_town.dropna(subset=['NAME'], inplace=True)


Plotting the earthquakes coloured by the NAME of the towns that they are close to shows which ones fall within the buffers around the towns.

In [24]:
fig, ax = plt.subplots(figsize=(8, 8))
countries_osgb.plot(ax=ax, edgecolor='brown', facecolor='none', alpha=0.2)
towns_buffer.plot(ax=ax, color='green', alpha=0.2)
quakes_by_town.plot(ax=ax, column='NAME', figsize=(12, 12), alpha=0.8, label='Near towns')
big_quakes_osgb.plot(ax=ax, color='black', edgecolor='none', alpha=0.8, marker='.',
label='Big quakes')
ax.set_aspect('equal', 'box')
leg = ax.legend()


### Top 20 UK towns for earthquake activity¶

The map above shows which earthquakes are close to towns. Applying the Pandas value_counts function to the quakes by town shows the Top 20.

And the winner is: a draw! Fort William ties with Manchester with 9 earthquakes each within a 40 km radius with magnitude greater than 3.

In [25]:
quakes_by_town['NAME'].value_counts().head(20)

Out[25]:
Fort William          9
Manchester            9
Stoke                 7
Sheffield             7
Cardiff               7
Nottingham            6
Leicester             5
Carlisle              4
Peterborough          4
Greenock              3
Birmingham            3
Dover                 3
Kingston upon Hull    3
Penzance              2
Swansea               2
Bristol               2
Coventry              2
Leeds                 2
Blackpool             2
Name: NAME, dtype: int64

### Conclusion¶

This is a very simplistic analysis, but it demonstrates how easily such workflows can be carried out using Python. Reading, analysis and plotting take very few lines of code, and the code is self-documenting, which is important for transparency and reproducibility.

It is also very easy to repeat and adjust the analysis. For example, 1970 to the present is a very short time period. It is probably not representative of longer term patterns. As an exercise, try running the notebook again using all earthquakes since 1700. The results may surprise you! See here for explanation.

## 4. Notes¶

#### Geospatial analysis as a web service¶

Automated analyses can be incorporated into a functions. The example below lists earthquakes within a certain radius of a given point. This could be configured to return output in JSON format output and wrapped in a Python web framework such as Falcon. The result would be a geospatial web service that performs automatic GIS-type analysis in response to HTTP requests.

In [26]:
def get_nearby_earthquakes(lon, lat, radius=20000):
"""
Return records from the earthquake catalogue within a given
distance of a point location.

:param lon: Longitude in decimal degrees
:param lat: Latitude in decimal degress
:param buffer: Search radius in metres
:return quakes: Pipe-separated representation of earthquake data
"""
# Create dataframe with buffered point
osgb = pyproj.Proj('+init=epsg:27700')
x, y = osgb(lon, lat)
crs={'init': 'epsg:27700'})

# Do spatial join
quakes = gpd.sjoin(big_quakes_osgb, search_zone, how='inner')

# Tidy output
quakes.index.name = 'timestamp'
quakes.reset_index(inplace=True)
quakes = quakes[['timestamp', 'lat', 'lon', 'depth', 'ML', 'locality']]

return quakes.to_csv(index=False, sep='|', float_format='%.4f')

In [27]:
# The coordinates are for BGS HQ in Keyworth
result = get_nearby_earthquakes(-1.089, 52.871)
print(result)

timestamp|lat|lon|depth|ML|locality
1984-05-30 02:53:59.400|52.8710|-1.1250|17.8000|3.0000| NOTTINGHAM
2001-10-28 16:25:24.900|52.8350|-0.8490|14.4000|4.1000| MELTON MOWBRAY,LEICS



#### Data cleaning¶

The CSV file downloaded from the website needed the following tweaks before it was ready for use:

• Add quotes around locality field cat raw.csv | sed 's/,/,"/9;s/\$/"/' > uk_earthquakes.csv
• Manually remove space from locality field name
• Manually remove data before 1700 because Pandas can't parse those timestamps automatically
• Manually fix typo in timestamp (1971-11-21, 07:68:36.2)

#### Acknowledgements¶

This notebook was inspired by Joris Van den Bossche's geopandas-tutorial, which he gave at the GeoPython 2018 conference.