The Pipeline
1
AIS Data Collection
Automatic Identification System transponders broadcast vessel position, heading, speed, and identity every few seconds.
We ingest this data across a 350nm radius around each port, capturing vessels from far approach through berthing.
High-priority zones (<50nm) are polled every 15 minutes; distant zones (50-350nm) every hour.
350nm radius
15min near / 60min far
21 ports
500+ vessels/day
2
Vessel Filtering & Classification
Raw AIS contains everything: tugs, pilot boats, fishing vessels, pleasure craft, and navigation buoys.
We filter to commercial vessels only (>100m, proper cargo types, >4h berth stays) and classify into 6 categories:
bulk, tanker, container, vehicle, passenger, and general cargo. Vessels are tracked from first appearance through departure.
6 vessel categories
>100m filter
Tugs excluded
MMSI 99x filtered
3
Zone Classification
Each vessel is placed into a behavioral zone based on distance from port and proximity to anchorage/berth polygons.
Zones drive prediction checkpoints: vessels get new TTB predictions at 350nm, 300nm, 200nm, 100nm, 50nm, and on zone transitions.
PostGIS handles the spatial queries using ST_DWithin for distance and ST_Within for polygon containment.
FAR_APPROACH >200nm
APPROACH 50-200nm
CLOSE <50nm
QUEUE/ANCHOR
BERTH
4
Feature Engineering
Every 15 minutes, we build a feature vector combining the vessel's current state with the port's operational state.
This includes vessel-specific features (distance, speed, deadweight tonnage) and port congestion features
(queue depth, berth utilization, same-type competition). 11 features total, carefully selected to maximize signal.
11 features
15min snapshots
Port + vessel state
Temporal encoding
5
LightGBM Prediction
Six separate LightGBM models predict Time-To-Berth (TTB) by vessel type. Each model is trained on historical
berth calls with time-based splits (never random — this is time-series). Models run every 30 minutes for all
active inbound vessels, generating point predictions and quantile estimates (P10/P25/P50/P75/P90) for JIT arrival planning.
6 type-specific models
LightGBM gradient boosting
30min prediction cycle
5 quantile bands
6
Autonomous ML Agents
Four autonomous agents continuously monitor, validate, and improve the prediction pipeline 24/7.
The ML Agent validates predictions against outcomes every 15 minutes. The Alongside Agent predicts berth duration.
Auto-Improve retrains underperforming models every 6 hours using sample-weighted training.
The Manager oversees all agents and escalates anomalies.
4 autonomous agents
Auto-retrain every 6h
Shadow model testing
Automatic promotion
Feature Set (V4)
distance_nm
Distance from port center in nautical miles (0-350nm)
sog_clamped
Speed over ground, clamped to training range (min 1.0 kn)
queue_port
Total vessels waiting at port (anchorage + queue zones)
berths_occupied
Number of berths currently occupied
berth_utilization
Ratio of occupied berths to total berths
hour_of_day
Current hour (0-23) — captures operational patterns
day_of_week
Day of week (0-6) — captures weekly cycles
deadweight
Vessel deadweight tonnage — proxy for cargo volume/size
queue_same_type
Vessels of same type already queuing (competition)
at_berth_same_type
Same-type vessels currently at berth (capacity signal)
month_of_year
Month (1-12) — captures seasonal patterns (soy season, etc.)
Model Accuracy (Production V4)
Passenger
2.2
hours MAE
Bulk
16.5
hours MAE
Vehicle
16.8
hours MAE
Hazard
17.5
hours MAE
Tanker
25.1
hours MAE
General
26.0
hours MAE
Congestion Forecasting
4-Horizon Binary Classification
Separate LightGBM classifiers predict port congestion probability at 8h, 24h, 48h, and 72h horizons.
Trained on historical port snapshots where congestion is defined as queue exceeding the 75th percentile.
Features include current queue, berth utilization, trend indicators (delta_queue_1h, delta_queue_6h),
and temporal encoding. Models retrain automatically when performance degrades below AUC threshold.
8h / 24h / 48h / 72h
AUC 0.76-0.77
Binary classification
Auto-retrain on drift