Automating the Creation of Georeferenced Sample Bounding Box Areas with Python

Benjamin
4 min readApr 9, 2023

--

Georeferenced AOI grid-offset creation of bounding box sample areas.

Sample boxes covering British Columbia

Generating sample bounding boxes (Polygons) can be useful for many different applications. A common one would be a Classification task, possibly performed by a Machine Learning model like a Random Forest. Generating bounding boxes over a large area could lead to a more general classification model. More effectively, such a task using remotely sensed data such as Earth Observation data from any of the available sensors in orbit (such as Sentinel and Landsat Families), would benefit from a more localised model trained on the nuances of the particular terrain of an area, however, there are many more applicable scenarios to generate georeferenced sample bounding boxes too.

Shapefiles of Canada

In order to follow along with this article, you will need the shapefile of Canada. I would highly recommend converting it into a parquet.

Python Imports

In order to generate some georeferenced sample boxes you will need to folllowing Python libraries to follow along with this article.

import pyproj
import math
import folium

import pandas as pd
import geopandas as gpd

from shapely.geometry import Polygon, shape, box
from shapely.ops import transform

We will be using pyproj and shapely.ops.transform to perform reprojections on the shapely.Polygon objects.

Define the Area of Interest (AOI)

Our AOI is an enveloped Polygon containing British Columbia, Canada and parts outside of the Province introduced from the enveloping.

AOI = shape({
"coordinates": [
[
[
-135.0728327,
47.9408248
],
[
-110.1118952,
47.8819136
],
[
-109.804278,
60.3341577
],
[
-134.9409968,
60.5509373
],
[
-135.0728327,
47.9408248
]
]
],
"type": "Polygon"
}).envelope

Enveloping a Polygon gives us an extent of the extremes of the AOI; minimum and maximum latitude and longitude.

Create Our Projected AOI’s

We are going to project the AOI into our destination projection — EPSG:3005 — BC Albers. This will put our projection into UTM projection to match coordinates to representable distances on a map, such as meters. We use the International Assocciation of Oil and Gas Produceres (IOGP) Coordinate Reference System (CRS) projection schema.

wgs84 = pyproj.CRS('EPSG:4326')
utm = pyproj.CRS('EPSG:3005')

project = pyproj.Transformer.from_crs(wgs84, utm, always_xy=True).transform
utm_aoi = transform(project, AOI)

We will also generate a “map” projected object that we will use later to visualise in Folium maps.

wgs84 = pyproj.CRS('EPSG:4326')
map_utm = pyproj.CRS('EPSG:3857')

project = pyproj.Transformer.from_crs(wgs84, map_utm, always_xy=True).transform
map_aoi = transform(project, AOI)

Read In Provinces/Territories, Cartographic Boundary File — 2016 Census

# canada = gpd.read_file('canada/lpr_000b16a_e.shp')
canada = gpd.read_parquet('canada.parquet')

Isolate British Columbia

bc = canada[canada['PREABBR'] == 'B.C.']
bc = bc.to_crs('EPSG:3005')

Pull Bounds From AOI

minx, miny, maxx, maxy = tuple(*list(bc.bounds.to_records(index=False)))

Create Bounding Boxes

Using the AOI bound extends, we can calculate equally distanced georeferenced sample Polygons (boxes) throughout the AOI where offset , box_width , and box_height are respective to the UTM coordinates and represent metres.

This step is fairly quick with the given sample box configuration, however, creating more boxes by decreasing the distance between them or the size of the boxes soon becomes a bottleneck for sequential systems. Python’s Dask is an excellent candidate to lazy load the sample box generation and utilise the systems resources effectively either locally or by disseminating the workload across Workers.

offset = 50000 # Distance between sample boxes in metres
box_width = 10000 # Width of sample box in metres
box_height = 10000 # Height of sample box in metres

dfs = []
for x in range(math.ceil(minx), math.floor(maxx), box_width+offset):
for y in range(math.ceil(miny), math.floor(maxy), box_height+offset):
# Convert Bounding Box to Polygon
bbox = box(x,y,x+box_width,y+box_height, ccw=True)
# project to dest CRS
this_df = gpd.GeoSeries([bbox], crs='EPSG:3005')
# Append to sample box collection
dfs.append(this_df)

# Squeeze the sample boxes into a single GeoDataFrame
df = gpd.GeoDataFrame(pd.concat(dfs), columns=['geometry']).set_crs('EPSG:3005')
df

Retain Sample Boxes In British Columbia

Since we used an enveloped AOI, there may be sample boxes outside of a more refined AOI, such as the British Columbia Provincial Census border. If we were performing a land classification, we would also want to ignore sample boxes over water/sea.

retain_df = df.loc[df['geometry'].intersects(bc.iloc[0]['geometry'])]
retain_df = retain_df.set_crs('EPSG:3005')

Visualising the sample boxes that fall within the British Columbia boundary, we can see that any boxes that would have fell outside this region, such as in Washington, US which latitude is inline with Victoria, British Columbia, have been excluded.

Closer Look at Sample boxes covering British Columbia — Vancouver Island and Lower Mainland shown in map

Save Sample Boxes

In order to use these georeferenced sample bounding boxes in an application, write out to a Cloud-friendly format, such as Parquet.

retain_df.to_parquet('training_locs.parquet')

Visualise Sample Boxes

To demonstrate the retention of the sample boxes within our refined AOI, the following code snippet builds a Folium map and plots the Polygons.

m = folium.Map(tiles='openstreetmap', maxbounds=bc_bounds)
folium.GeoJson(retain_df["geometry"].to_crs('EPSG:3857')).add_to(m)
m

And below is the output of that code.

Read More…

--

--

Benjamin
Benjamin

Written by Benjamin

Open Science | Cloud Platform Developer | EO | Remote Sensing | Software Developer

No responses yet