Technical Methodology

FROM AIS SIGNAL
TO BERTH PREDICTION

We collect AIS signals from a 350nm radius, classify vessel intent through zone analysis, engineer 11 predictive features, and run LightGBM models to predict berthing time per vessel type.

The Pipeline
1
AIS Data Collection
Automatic Identification System transponders broadcast vessel position, heading, speed, and identity every few seconds. We ingest this data across a 350nm radius around each port, capturing vessels from far approach through berthing. High-priority zones (<50nm) are polled every 15 minutes; distant zones (50-350nm) every hour.
350nm radius 15min near / 60min far 21 ports 500+ vessels/day
2
Vessel Filtering & Classification
Raw AIS contains everything: tugs, pilot boats, fishing vessels, pleasure craft, and navigation buoys. We filter to commercial vessels only (>100m, proper cargo types, >4h berth stays) and classify into 6 categories: bulk, tanker, container, vehicle, passenger, and general cargo. Vessels are tracked from first appearance through departure.
6 vessel categories >100m filter Tugs excluded MMSI 99x filtered
3
Zone Classification
Each vessel is placed into a behavioral zone based on distance from port and proximity to anchorage/berth polygons. Zones drive prediction checkpoints: vessels get new TTB predictions at 350nm, 300nm, 200nm, 100nm, 50nm, and on zone transitions. PostGIS handles the spatial queries using ST_DWithin for distance and ST_Within for polygon containment.
FAR_APPROACH >200nm APPROACH 50-200nm CLOSE <50nm QUEUE/ANCHOR BERTH
4
Feature Engineering
Every 15 minutes, we build a feature vector combining the vessel's current state with the port's operational state. This includes vessel-specific features (distance, speed, deadweight tonnage) and port congestion features (queue depth, berth utilization, same-type competition). 11 features total, carefully selected to maximize signal.
11 features 15min snapshots Port + vessel state Temporal encoding
5
LightGBM Prediction
Six separate LightGBM models predict Time-To-Berth (TTB) by vessel type. Each model is trained on historical berth calls with time-based splits (never random — this is time-series). Models run every 30 minutes for all active inbound vessels, generating point predictions and quantile estimates (P10/P25/P50/P75/P90) for JIT arrival planning.
6 type-specific models LightGBM gradient boosting 30min prediction cycle 5 quantile bands
6
Autonomous ML Agents
Four autonomous agents continuously monitor, validate, and improve the prediction pipeline 24/7. The ML Agent validates predictions against outcomes every 15 minutes. The Alongside Agent predicts berth duration. Auto-Improve retrains underperforming models every 6 hours using sample-weighted training. The Manager oversees all agents and escalates anomalies.
4 autonomous agents Auto-retrain every 6h Shadow model testing Automatic promotion
Feature Set (V4)
distance_nm
Distance from port center in nautical miles (0-350nm)
sog_clamped
Speed over ground, clamped to training range (min 1.0 kn)
queue_port
Total vessels waiting at port (anchorage + queue zones)
berths_occupied
Number of berths currently occupied
berth_utilization
Ratio of occupied berths to total berths
hour_of_day
Current hour (0-23) — captures operational patterns
day_of_week
Day of week (0-6) — captures weekly cycles
deadweight
Vessel deadweight tonnage — proxy for cargo volume/size
queue_same_type
Vessels of same type already queuing (competition)
at_berth_same_type
Same-type vessels currently at berth (capacity signal)
month_of_year
Month (1-12) — captures seasonal patterns (soy season, etc.)
Model Accuracy (Production V4)
Passenger
2.2
hours MAE
Bulk
16.5
hours MAE
Vehicle
16.8
hours MAE
Hazard
17.5
hours MAE
Tanker
25.1
hours MAE
General
26.0
hours MAE
Congestion Forecasting
4-Horizon Binary Classification
Separate LightGBM classifiers predict port congestion probability at 8h, 24h, 48h, and 72h horizons. Trained on historical port snapshots where congestion is defined as queue exceeding the 75th percentile. Features include current queue, berth utilization, trend indicators (delta_queue_1h, delta_queue_6h), and temporal encoding. Models retrain automatically when performance degrades below AUC threshold.
8h / 24h / 48h / 72h AUC 0.76-0.77 Binary classification Auto-retrain on drift
See It In Action
Real predictions. Real ports. Real-time data.
Sign In