Automating the Creation of Georeferenced Sample Bounding Box Areas with Python
Georeferenced AOI grid-offset creation of bounding box sample areas.
Generating sample bounding boxes (Polygons) can be useful for many different applications. A common one would be a Classification task, possibly performed by a Machine Learning model like a Random Forest. Generating bounding boxes over a large area could lead to a more general classification model. More effectively, such a task using remotely sensed data such as Earth Observation data from any of the available sensors in orbit (such as Sentinel and Landsat Families), would benefit from a more localised model trained on the nuances of the particular terrain of an area, however, there are many more applicable scenarios to generate georeferenced sample bounding boxes too.
Shapefiles of Canada
In order to follow along with this article, you will need the shapefile of Canada. I would highly recommend converting it into a parquet.
Python Imports
In order to generate some georeferenced sample boxes you will need to folllowing Python libraries to follow along with this article.
import pyproj
import math
import folium
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, shape, box
from shapely.ops import transform
We will be using pyproj
and shapely.ops.transform
to perform reprojections on the shapely.Polygon
objects.
Define the Area of Interest (AOI)
Our AOI is an enveloped Polygon containing British Columbia, Canada and parts outside of the Province introduced from the enveloping.
AOI = shape({
"coordinates": [
[
[
-135.0728327,
47.9408248
],
[
-110.1118952,
47.8819136
],
[
-109.804278,
60.3341577
],
[
-134.9409968,
60.5509373
],
[
-135.0728327,
47.9408248
]
]
],
"type": "Polygon"
}).envelope
Enveloping a Polygon gives us an extent of the extremes of the AOI; minimum and maximum latitude and longitude.
Create Our Projected AOI’s
We are going to project the AOI into our destination projection — EPSG:3005 — BC Albers. This will put our projection into UTM projection to match coordinates to representable distances on a map, such as meters. We use the International Assocciation of Oil and Gas Produceres (IOGP) Coordinate Reference System (CRS) projection schema.
wgs84 = pyproj.CRS('EPSG:4326')
utm = pyproj.CRS('EPSG:3005')
project = pyproj.Transformer.from_crs(wgs84, utm, always_xy=True).transform
utm_aoi = transform(project, AOI)
We will also generate a “map” projected object that we will use later to visualise in Folium maps.
wgs84 = pyproj.CRS('EPSG:4326')
map_utm = pyproj.CRS('EPSG:3857')
project = pyproj.Transformer.from_crs(wgs84, map_utm, always_xy=True).transform
map_aoi = transform(project, AOI)
Read In Provinces/Territories, Cartographic Boundary File — 2016 Census
# canada = gpd.read_file('canada/lpr_000b16a_e.shp')
canada = gpd.read_parquet('canada.parquet')
Isolate British Columbia
bc = canada[canada['PREABBR'] == 'B.C.']
bc = bc.to_crs('EPSG:3005')
Pull Bounds From AOI
minx, miny, maxx, maxy = tuple(*list(bc.bounds.to_records(index=False)))
Create Bounding Boxes
Using the AOI bound extends, we can calculate equally distanced georeferenced sample Polygons (boxes) throughout the AOI where offset
, box_width
, and box_height
are respective to the UTM coordinates and represent metres.
This step is fairly quick with the given sample box configuration, however, creating more boxes by decreasing the distance between them or the size of the boxes soon becomes a bottleneck for sequential systems. Python’s Dask is an excellent candidate to lazy load the sample box generation and utilise the systems resources effectively either locally or by disseminating the workload across Workers.
offset = 50000 # Distance between sample boxes in metres
box_width = 10000 # Width of sample box in metres
box_height = 10000 # Height of sample box in metres
dfs = []
for x in range(math.ceil(minx), math.floor(maxx), box_width+offset):
for y in range(math.ceil(miny), math.floor(maxy), box_height+offset):
# Convert Bounding Box to Polygon
bbox = box(x,y,x+box_width,y+box_height, ccw=True)
# project to dest CRS
this_df = gpd.GeoSeries([bbox], crs='EPSG:3005')
# Append to sample box collection
dfs.append(this_df)
# Squeeze the sample boxes into a single GeoDataFrame
df = gpd.GeoDataFrame(pd.concat(dfs), columns=['geometry']).set_crs('EPSG:3005')
df
Retain Sample Boxes In British Columbia
Since we used an enveloped AOI, there may be sample boxes outside of a more refined AOI, such as the British Columbia Provincial Census border. If we were performing a land classification, we would also want to ignore sample boxes over water/sea.
retain_df = df.loc[df['geometry'].intersects(bc.iloc[0]['geometry'])]
retain_df = retain_df.set_crs('EPSG:3005')
Visualising the sample boxes that fall within the British Columbia boundary, we can see that any boxes that would have fell outside this region, such as in Washington, US which latitude is inline with Victoria, British Columbia, have been excluded.
Save Sample Boxes
In order to use these georeferenced sample bounding boxes in an application, write out to a Cloud-friendly format, such as Parquet.
retain_df.to_parquet('training_locs.parquet')
Visualise Sample Boxes
To demonstrate the retention of the sample boxes within our refined AOI, the following code snippet builds a Folium map and plots the Polygons.
m = folium.Map(tiles='openstreetmap', maxbounds=bc_bounds)
folium.GeoJson(retain_df["geometry"].to_crs('EPSG:3857')).add_to(m)
m
And below is the output of that code.