AIS Data Handler
The AIS (Automatic Identification System) module provides functionality to download, filter, and process AIS data from Hugging Face datasets.
Installation
The AIS functionality requires additional dependencies:
# Install with AIS support
pip install phidown[ais]
# Or install dependencies manually
pip install huggingface_hub shapely
Quick Start
Basic Usage
from phidown.ais import download_ais_data
# Download AIS data for a single day
df = download_ais_data("2025-08-25")
print(f"Downloaded {len(df)} AIS records")
Time Filtering
# Download with time window (10:00 to 12:00 UTC)
df = download_ais_data(
start_date="2025-08-25",
start_time="10:00:00",
end_time="12:00:00"
)
Spatial Filtering with AOI
# Define Area of Interest (AOI) as WKT polygon
aoi_wkt = """POLYGON((4.2100 51.3700,4.4800 51.3700,4.5100 51.2900,
4.4650 51.1700,4.2500 51.1700,4.1900 51.2500,4.2100 51.3700))"""
df = download_ais_data(
start_date="2025-08-25",
aoi_wkt=aoi_wkt
)
Advanced Usage
Using the AISDataHandler Class
from datetime import date, time
from phidown.ais import AISDataHandler
# Create handler instance
handler = AISDataHandler()
# Download data with all filters
df = handler.get_ais_data(
start_date=date(2025, 8, 25),
end_date=date(2025, 8, 26),
start_time=time(9, 0, 0),
end_time=time(15, 0, 0),
aoi_wkt="POLYGON((4.0 51.0,5.0 51.0,5.0 52.0,4.0 52.0,4.0 51.0))"
)
# Check for errors during processing
errors = handler.get_errors()
if errors:
print("Encountered errors:")
for error in errors:
print(f" - {error}")
Custom Hugging Face Repository
# Use custom repository
handler = AISDataHandler(
hf_repo_id="your-org/custom-ais-repo",
file_template="ais_data_{date}.parquet"
)
df = handler.get_ais_data("2025-08-25")
Data Format
The returned DataFrame contains the following standardized columns:
name
: Vessel name (string)lat
: Latitude (float)lon
: Longitude (float)source_date
: Date of data source (YYYY-MM-DD string)timestamp
: Timestamp in YYYY-MM-DD HH:MM:SS format (string)mmsi
: Maritime Mobile Service Identity (string)
Error Handling
The AIS handler gracefully handles various error conditions:
Missing data files for requested dates
Corrupted or unreadable parquet files
Missing coordinate or timestamp columns
Invalid WKT geometry for AOI filtering
Errors are collected and can be retrieved using the get_errors()
method.
Time Window Filtering
Time filtering supports:
Normal time windows: e.g., 10:00 to 14:00
Overnight windows: e.g., 22:00 to 06:00 (crosses midnight)
Start time only: filters from start time to end of day
End time only: filters from start of day to end time
Time formats supported:
HH:MM:SS
(e.g., “14:30:45”)HH:MM
(e.g., “14:30”)
Spatial Filtering
AOI filtering requires the shapely
library and accepts:
WKT (Well-Known Text) polygon strings
Any valid polygon geometry
Points on polygon boundaries are included
Example WKT polygons:
# Simple rectangle
aoi = "POLYGON((4.0 51.0,5.0 51.0,5.0 52.0,4.0 52.0,4.0 51.0))"
# Complex polygon around Netherlands coast
aoi = """POLYGON((4.2100 51.3700,4.4800 51.3700,4.5100 51.2900,
4.4650 51.1700,4.2500 51.1700,4.1900 51.2500,4.2100 51.3700))"""
Dependencies
Required:
pandas
,huggingface_hub
Optional:
shapely
(for AOI filtering)
Performance Notes
Data is downloaded once per date and cached locally by Hugging Face Hub
Large datasets are processed incrementally to manage memory usage
Spatial filtering is performed in-memory and may be slow for large datasets
Consider using time filtering to reduce data volume before spatial filtering