Project 4: Road Finding¶
by Lorn
Hypothesis¶
It is possible to extract roads from satellite imagery with enough clarity to distinguish them from non-road features.
Data¶
Massachusetts Roads. A dataset of RGB images and rasterized mask of the road segments in that image.
https://www.kaggle.com/datasets/balraj98/massachusetts-roads-dataset
Microsoft Building Footprints. Building footprints in vector data format.
https://planetarycomputer.microsoft.com/dataset/ms-buildings
0.5m DEM of Massachusetts from MassGIS.
https://www.mass.gov/orgs/massgis-bureau-of-geographic-information
import rasterio
import numpy as np
import pandas as pd
import warnings
from pathlib import Path
import geopandas as gpd
from shapely.geometry import box
import matplotlib.pyplot as plt
import folium
import ipywidgets as widgets
from IPython.display import display
import joblib
import ipywidgets as widgets
from IPython.display import display
from scipy.ndimage import sobel, gaussian_filter
from sklearn.ensemble import RandomForestClassifier
from skimage.exposure import rescale_intensity
from skimage.filters import frangi
from skimage.morphology import closing, dilation, rectangle as footprint_rectangle
import warnings
warnings.filterwarnings("ignore")
View #1¶
The first view is a display of the raw data. We'll be working with the MassRoads dataset primarily and joining in additional data when needed. The images in MassRoads are distributed over a variety of small and large cities in the state. Mostly urban and suburban areas.
DATA_DIR = Path("roads")
records = []
for tif in DATA_DIR.rglob("*.tiff"):
with rasterio.open(tif) as src:
if not src.crs:
continue
left, bottom, right, top = src.bounds
geom = box(left, bottom, right, top)
records.append({
"path": str(tif),
"geometry": geom
})
gdf = gpd.GeoDataFrame(records, crs="EPSG:26986")
gdf_ll = gdf.to_crs("EPSG:4326")
center = [42.25, -71.8]
m = folium.Map(location=center, zoom_start=9, tiles="cartodb positron")
for _, row in gdf_ll.iterrows():
geojson = folium.GeoJson(row["geometry"])
geojson.add_to(m)
m
The raw MassRoads dataset. Tiffs with a red, green, blue, and road mask bands.
img = rasterio.open("roads/test/10378780_15.tiff").read()
label = rasterio.open("roads/test_labels/10378780_15.tif").read(1)
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(img.transpose(1,2,0))
plt.title("Image")
plt.axis("off")
plt.subplot(1,2,2)
plt.imshow(label, cmap="gray")
plt.title("Label")
plt.axis("off")
plt.show()
Data joined with our 0.5m DEM, terrain features derived from the DEM, and a housing mask.
We used these specific terrain features because we thought they'd have the most contrast between roads and non-roads. A roads slope should be low. It's roughness (the difference in slope in the pixels around each pixel) should also be low. The hillshade was primarily for visualization.
We won't use the housing masks much for extracting roads as our whole hypothesis was about doing so to the exclusion of other features. However the building footprints will help us measure how wrong we are. We did pull the footprint data from Microsoft Planetary Computer.
band_names = [
"Red",
"Green",
"Blue",
"Elevation",
"Housing Mask",
"Slope",
"Hillshade",
"Roughness",
"Road Mask"
]
path = Path("tiffs/17428960_15.tiff")
with rasterio.open(path) as src:
img = src.read()
transform = src.transform
width = src.width
height = src.height
cx = width / 2
cy = height / 2
center_x, center_y = transform * (cx, cy)
bands, H, W = img.shape
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
fig.suptitle(f"Bands for {path.name}", fontsize=16)
axes = axes.ravel()
for i in range(9):
ax = axes[i]
if i < bands:
ax.imshow(img[i], cmap="gray")
ax.set_title(band_names[i], fontsize=12)
else:
ax.axis("off")
ax.set_xticks([])
ax.set_yticks([])
plt.tight_layout()
plt.show()
I think we have sufficient data to prove our hypothesis. I can't really think of anything else available statewide at 1m resolution that would really give us much of an advantage finding roads. As always just having the data does not prove our hypothesis so more analysis is needed.
View #2¶
In the previous view we showed that we had the data. But can you see the roads? In the RGB imagery the roads are clear. They are all greyish and of relatively similar width (with the exception of the interstate).
path = Path("tiffs/17428960_15.tiff")
with rasterio.open(path) as src:
img = src.read()
R = img[0]
G = img[1]
B = img[2]
rgb = np.dstack([R, G, B]).astype(np.float32)
rgb -= rgb.min()
rgb /= rgb.max()
plt.figure(figsize=(20, 20))
plt.imshow(rgb)
plt.title(f"True Color RGB View of {path.name}", fontsize=16)
plt.axis("off")
plt.show()
It is much more difficult to see roads in the terrain data. In the hillshade band you can make out the highway and a few roads. However the neighborhood roads are not visible.
path = Path("17428960_15.tiff")
with rasterio.open(path) as src:
img = src.read()
hillshade = img[6].astype(np.float32)
hillshade = (hillshade - hillshade.min()) / (hillshade.max() + 1e-6)
hs_rgb = np.dstack([hillshade, hillshade, hillshade])
plt.figure(figsize=(20, 20))
plt.imshow(hs_rgb)
plt.title("Hillshade")
plt.axis("off")
plt.show()
Visual intuition for where the roads are is nice, but we want something that can be measured.
View #3¶
Roads are generally long, smooth, and evenly colored. To extract roads from our images we need something that takes those features and uses them to differentiate the roads from the surroundings. Here we'll focusing on the long and smooth aspect.
Canny edge detector
Finds the magnitude of the gradient between pixels. Sharp changes are edges. Medium sharp changes are kept as long as they are connected to sharp edges. In the center roads should be the same and then change at the edges so this should pick them up.
path = Path("17428960_15.tiff")
with rasterio.open(path) as src:
data = src.read()
rgb = data[:3]
true_mask = data[-1] > 0
rgb_norm = (rgb - rgb.min()) / (rgb.max() - rgb.min() + 1e-6)
gray = 0.2989*rgb_norm[0] + 0.5870*rgb_norm[1] + 0.1140*rgb_norm[2]
gray = rescale_intensity(gray, out_range=(0, 1))
edges = canny(gray, sigma=2.0, low_threshold=0.1, high_threshold=0.2)
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
axs[0].imshow(np.moveaxis(rgb_norm, 0, -1))
axs[0].set_title("RGB")
axs[0].axis("off")
axs[1].imshow(edges, cmap="gray")
axs[1].set_title("Predicted Mask (Canny)")
axs[1].axis("off")
axs[2].imshow(true_mask, cmap="gray")
axs[2].set_title("True Mask")
axs[2].axis("off")
plt.tight_layout()
plt.show()
def test_pred(pred, true, img):
pred = (pred > 0).astype(np.uint8)
true = (true > 0).astype(np.uint8)
p = pred.reshape(-1)
t = true.reshape(-1)
total = len(p)
TP = np.logical_and(p == 1, t == 1).sum()
TN = np.logical_and(p == 0, t == 0).sum()
FP = np.logical_and(p == 1, t == 0).sum()
FN = np.logical_and(p == 0, t == 1).sum()
inter = np.logical_and(p == 1, t == 1).sum()
union = np.logical_or(p == 1, t == 1).sum()
IoU = inter / union if union > 0 else 1.0
TPp = TP / total
FPp = FP / total
TNp = TN / total
FNp = FN / total
print(f"IoU: {IoU:.4f}")
print(f"TP: {TPp:.4f}%, FP: {FPp:.4f}%, TN: {TNp:.4f}%, FN: {FNp:.4f}%")
test_pred(pred_mask, edges, img)
IoU: 0.0192 TP: 0.0023%, FP: 0.0777%, TN: 0.8815%, FN: 0.0385%
Unfortunately this method picks up all edges, including from unwanted features. You can kind of see the roads in there interspersed with all the building edge detections. The solution to this is a vesselness filter.
Frangi vesselness filter
Canny edges use the first derivative of the brighness of each pixel $I(x, y)$ (I stands for intensity). This filter uses the second derivatives (Hessian) and the eigenvalues of that matrix.
$$ H = \begin{bmatrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \end{bmatrix}, $$
$$ H v = \lambda v $$
The Hessian gives you the rate of change in the slope of the brightness at every point, and its eigenvalues are the direction of the min and max curvatures. A road has a low curvature in one direction (the direction of the road) and a high curvature in the other (the direction of the sides of the road). This should do a slightly better job at picking them up.
path = Path("17428960_15.tiff")
with rasterio.open(path) as src:
data = src.read()
rgb = data[:3]
true_mask = data[-1] > 0
rgb_norm = (rgb - rgb.min()) / (rgb.max() - rgb.min() + 1e-6)
gray = rgb_norm[1]
gray = rescale_intensity(gray, out_range=(0, 1))
gray = gaussian_filter(gray, sigma=2)
vessel = frangi(
gray,
scale_range=(5, 10),
scale_step=1,
beta=0.3,
gamma=10,
black_ridges=False
)
vessel_norm = (vessel - vessel.min()) / (vessel.max() - vessel.min() + 1e-6)
thresh = np.percentile(vessel_norm, 92)
pred_mask = vessel_norm > thresh
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
axs[0].imshow(np.moveaxis(rgb_norm, 0, -1))
axs[0].set_title("RGB")
axs[0].axis("off")
axs[1].imshow(pred_mask, cmap="gray")
axs[1].set_title("Predicted Mask")
axs[1].axis("off")
axs[2].imshow(true_mask, cmap="gray")
axs[2].set_title("True Mask")
axs[2].axis("off")
plt.tight_layout()
plt.show()
test_pred(pred_mask, true_mask, img)
IoU: 0.2004 TP: 0.0262%, FP: 0.0538%, TN: 0.8691%, FN: 0.0509%
You can see the roads! Many are missing and there are still many buildings showing up but the roads in the neighborhood are visible. We are still not achieving "distinguishment from non-road features".
It might not be apparent, but part of what is wrong with this view is the amount of effort it took. Manual filtering is quite time consuming as it takes careful tweaking of parameters and is difficult to generalize to multiple images. For the next view we'll use a model that can do that on it's own.
View #4¶
For this view we'll be training an asphalt detector. One of our first attempts at manually extracting roads was to try and find a min and max threshold for each of the bands and see if there was some combination of ranges that only pulled out roads. This was largely unsuccessful, not to mention tedious. This model will do that automatically.
To train the asphalt detector we'll randomly sample pixels from our tiffs, and try and guess whether or not they are road. Once the model is trained we'll be able to classify every pixel in an image and get a mask of road vs. non-road. Several test images have been excluded from the training set.
TIFF_DIR = Path("tiffs")
N_SAMPLES = 50_000
ROAD_WEIGHT = 0.30
tifs = sorted(list(TIFF_DIR.glob("*.tif")) + list(TIFF_DIR.glob("*.tiff")))
X_list = []
y_list = []
for tif in tifs:
with rasterio.open(tif) as src:
img = src.read().astype(np.float32)
bands, H, W = img.shape
features = img[:-1]
mask = img[-1]
F = features.reshape(features.shape[0], -1).T
M = mask.reshape(-1).astype(np.uint8)
R = img[0].reshape(-1)
G = img[1].reshape(-1)
B = img[2].reshape(-1)
white = (R > 250) & (G > 250) & (B > 250)
not_white = ~white
F = F[not_white]
M = M[not_white]
road_idx = np.where(M > 0)[0]
bg_idx = np.where(M == 0)[0]
n_road = int(N_SAMPLES * ROAD_WEIGHT / len(tifs))
n_bg = int(N_SAMPLES * (1 - ROAD_WEIGHT) / len(tifs))
if len(road_idx) > 0:
road_sample = np.random.choice(road_idx, min(n_road, len(road_idx)), replace=False)
else:
road_sample = np.array([], dtype=int)
if len(bg_idx) > 0:
bg_sample = np.random.choice(bg_idx, min(n_bg, len(bg_idx)), replace=False)
else:
bg_sample = np.array([], dtype=int)
sampled = np.concatenate([road_sample, bg_sample])
X_list.append(F[sampled])
y_list.append(M[sampled])
X = np.concatenate(X_list)
y = np.concatenate(y_list)
print("Training dataset shape:", X.shape)
print("Road pixels %:", y.mean())
clf = RandomForestClassifier(
n_estimators=200,
n_jobs=-1,
class_weight="balanced"
)
clf.fit(X, y)
band_names = [
"Red",
"Green",
"Blue",
"Elevation",
"HousingMask",
"Slope",
"Hillshade",
"Roughness"
]
print("\n=== FEATURE IMPORTANCE ===")
for name, importance in zip(band_names, clf.feature_importances_):
print(f"{name:15s} {importance:.4f}")
Training dataset shape: (47814, 8) Road pixels %: 74.04567699836868 === FEATURE IMPORTANCE === Red 0.1948 Green 0.1887 Blue 0.3073 Elevation 0.1782 HousingMask 0.0406 Slope 0.0484 Hillshade 0.0385 Roughness 0.0034
joblib.dump(clf, "rf_model.joblib")
clf = joblib.load("rf_model.joblib")
tif_path = Path("17428960_15.tiff")
with rasterio.open(tif_path) as src:
img = src.read().astype(np.float32)
bands, H, W = img.shape
features = img[:-1]
F = features.reshape(features.shape[0], -1).T
R = img[0].reshape(-1)
G = img[1].reshape(-1)
B = img[2].reshape(-1)
white = (R > 250) & (G > 250) & (B > 250)
F_valid = F[~white]
print("Running RF inference on", F_valid.shape[0], "pixels...")
pred_valid = clf.predict(F_valid)
pred_full = np.zeros(F.shape[0], dtype=np.uint8)
pred_full[~white] = pred_valid
pred_full = pred_full.reshape(H, W)
true_mask = img[-1]
rgb = np.moveaxis(img[:3], 0, -1).astype(np.uint8)
fig, ax = plt.subplots(1, 3, figsize=(18, 6))
ax[0].imshow(rgb)
ax[0].set_title("RGB")
ax[0].axis("off")
ax[1].imshow(pred_full, cmap="gray")
ax[1].set_title("Predicted Mask")
ax[1].axis("off")
ax[2].imshow(true_mask, cmap="gray")
ax[2].set_title("True Mask")
ax[2].axis("off")
plt.tight_layout()
plt.show()
Running RF inference on 2249909 pixels...
test_pred(pred_full, true_mask, img)
IoU: 0.2870 TP: 0.0657%, FP: 0.1517%, TN: 0.7711%, FN: 0.0115%
This does quite well. We are now at the point where we have the roads and have excluded other structures like buildings. At this point I feel we've won. However we are not just finding roads. We are finding the specific road mask in our dataset, which does not cover the entire road. So this will be somewhat innaccurate as it is pixel by pixel only and cannot tell the difference between a pixel at the edge of the road (not in mask) and a center pixel (in the mask). It also classifies driveways, concrete pads, and parking lots as roads.
View #5¶
Credit to https://www.kaggle.com/code/balraj98/unet-resnet50-frontend-road-segmentation-pytorch
This model is based on the core of their code for this dataset, with some minor changes in libraries (the original notebook is 5 years old).
Our previous model learned a combination of input values for a single pixel that were likely to be a road. This model does the same, but also takes into account the values surrounding each pixel. Information about a few hundred surrounding pixels goes into the classification. The U-Net does this by learning several convolutional kernels that combine values for each layer of the U-Net.
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import albumentations as A
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.losses import DiceLoss
class SegmentationDataset(Dataset):
def __init__(self, df, augmentations=None):
self.df = df
self.augment = augmentations
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
row = self.df.iloc[idx]
with rasterio.open(row.images) as src:
img = src.read([1,2,3])
img = np.moveaxis(img, 0, -1).astype(np.uint8)
with rasterio.open(row.masks) as src:
mask = src.read(1)
mask = (mask > 0).astype(np.uint8)
mask = np.expand_dims(mask, -1)
if self.augment:
data = self.augment(image=img, mask=mask)
img = data["image"]
mask = data["mask"]
img = np.transpose(img, (2,0,1)).astype(np.float32) / 255.0
mask = np.transpose(mask, (2,0,1)).astype(np.float32)
return torch.tensor(img), torch.tensor(mask)
def get_train_augs():
return A.Compose([
A.Resize(512, 512),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
])
def get_val_augs():
return A.Compose([
A.Resize(512, 512)
])
class SegmentationModel(nn.Module):
def __init__(self):
super().__init__()
self.model = smp.Unet(
encoder_name="timm-efficientnet-b0",
encoder_weights="imagenet",
in_channels=3,
classes=1,
activation=None
)
def forward(self, images, masks=None):
logits = self.model(images)
if masks is not None:
loss = DiceLoss(mode="binary")(logits, masks)
loss += nn.BCEWithLogitsLoss()(logits, masks)
return logits, loss
return logits
def iou_score(preds, masks):
preds = torch.sigmoid(preds)
preds = (preds > 0.5).float()
intersection = (preds * masks).sum(dim=(1,2,3))
union = preds.sum(dim=(1,2,3)) + masks.sum(dim=(1,2,3)) - intersection
return ((intersection + 1e-6) / (union + 1e-6)).mean().item()
def train_fn(loader, model, optim):
model.train()
total_loss = 0
total_iou = 0
for imgs, masks in loader:
imgs, masks = imgs.cuda(), masks.cuda()
optim.zero_grad()
logits, loss = model(imgs, masks)
loss.backward()
optim.step()
total_loss += loss.item()
total_iou += iou_score(logits.detach(), masks)
return total_loss / len(loader), total_iou / len(loader)
def eval_fn(loader, model):
model.eval()
total_loss = 0
total_iou = 0
with torch.no_grad():
for imgs, masks in loader:
imgs, masks = imgs.cuda(), masks.cuda()
logits, loss = model(imgs, masks)
total_loss += loss.item()
total_iou += iou_score(logits, masks)
return total_loss / len(loader), total_iou / len(loader)
train_df = pd.read_csv("train.csv")
val_df = pd.read_csv("val.csv")
trainset = SegmentationDataset(train_df, get_train_augs())
valset = SegmentationDataset(val_df, get_val_augs())
trainloader = DataLoader(trainset, batch_size=4, shuffle=True, num_workers=4)
valloader = DataLoader(valset, batch_size=4, shuffle=False, num_workers=4)
print(f"[OK] Train images: {len(trainset)}, Val images: {len(valset)}")
model = SegmentationModel().cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003)
best_loss = float("inf")
for epoch in range(1, 26):
train_loss, train_iou = train_fn(trainloader, model, optimizer)
val_loss, val_iou = eval_fn(valloader, model)
if val_loss < best_loss:
best_loss = val_loss
torch.save(model.state_dict(), "best_model.pt")
print(f"[EPOCH {epoch}] Saved best model")
print(
f"[Epoch {epoch}] "
f"train_loss={train_loss:.4f} val_loss={val_loss:.4f} "
f"train_iou={train_iou:.4f} val_iou={val_iou:.4f}"
)
print("Training complete.")
[OK] Train images: 1108, Val images: 14
[EPOCH 1] Saved best model
[Epoch 1] train_loss=1.0094 val_loss=0.7449 train_iou=0.2531 val_iou=0.3526
[EPOCH 2] Saved best model
[Epoch 2] train_loss=0.6773 val_loss=0.6945 train_iou=0.3431 val_iou=0.3710
[EPOCH 3] Saved best model
[Epoch 3] train_loss=0.6311 val_loss=0.6581 train_iou=0.3654 val_iou=0.3882
[EPOCH 4] Saved best model
[Epoch 4] train_loss=0.6022 val_loss=0.6279 train_iou=0.3796 val_iou=0.4179
[Epoch 5] train_loss=0.5885 val_loss=0.6430 train_iou=0.3913 val_iou=0.3968
[Epoch 6] train_loss=0.5778 val_loss=0.6317 train_iou=0.4002 val_iou=0.4120
[Epoch 7] train_loss=0.5673 val_loss=0.6406 train_iou=0.4056 val_iou=0.4011
[Epoch 8] train_loss=0.5605 val_loss=0.6291 train_iou=0.4102 val_iou=0.4116
[EPOCH 9] Saved best model
[Epoch 9] train_loss=0.5554 val_loss=0.5958 train_iou=0.4147 val_iou=0.4420
[EPOCH 10] Saved best model
[Epoch 10] train_loss=0.5494 val_loss=0.5830 train_iou=0.4176 val_iou=0.4440
[Epoch 11] train_loss=0.5441 val_loss=0.6182 train_iou=0.4230 val_iou=0.4141
[Epoch 12] train_loss=0.5404 val_loss=0.5969 train_iou=0.4260 val_iou=0.4307
class RoadModel(nn.Module):
def __init__(self):
super().__init__()
self.net = smp.Unet(
encoder_name="timm-efficientnet-b4",
encoder_weights=None,
in_channels=3,
classes=1,
activation=None
)
def forward(self, x):
return self.net(x)
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using:", device)
model = RoadModel().to(device)
model.load_state_dict(torch.load("best_model.pt", map_location=device))
model.eval()
inf_aug = A.Compose([
A.Resize(1024, 1024),
])
tif_path = Path("17428960_15.tiff")
with rasterio.open(tif_path) as src:
img = src.read([1,2,3])
img = np.moveaxis(img, 0, -1).astype(np.uint8)
H0, W0 = img.shape[:2]
out = inf_aug(image=img)
img_resized = out["image"]
inp = img_resized.astype(np.float32) / 255.0
inp = np.transpose(inp, (2,0,1))
inp = torch.tensor(inp).unsqueeze(0).to(device)
with torch.no_grad():
with torch.cuda.amp.autocast():
pred = model(inp)
pred = torch.sigmoid(pred)[0,0].cpu().numpy()
pred = pred.astype(np.float32)
pred_up = cv2.resize(pred, (W0, H0), interpolation=cv2.INTER_LINEAR)
mask = (pred_up > 0.5).astype(np.uint8)
plt.figure(figsize=(18,6))
plt.subplot(1,3,1)
plt.imshow(img)
plt.title("RGB")
plt.axis("off")
plt.subplot(1,3,2)
plt.imshow(mask, cmap="gray")
plt.title("Predicted Mask")
plt.axis("off")
plt.subplot(1,3,3)
plt.imshow(true_mask, cmap="gray")
plt.title("True Mask")
plt.axis("off")
plt.tight_layout()
plt.show()
Using: cuda
test_pred(mask, true_mask, img)
IoU: 0.5723 TP: 0.0544%, FP: 0.0179%, TN: 0.9049%, FN: 0.0228%
Fewer false positive pixels than previous methods. There are a few detections that are not present in the mask, but the model seems correct not the source data. There are a few connecting roads missing in the datasets mask. No houses or parking lots are being detected (although things do get more confused in the parking lot to the right).
Overall I think we've proven our hypothesis, at least for this specific image/ over this dataset. If we had more time I'd like to do more testing over more images in the dataset as well as for roads outside Massachusetts.