phidown.ais
AIS (Automatic Identification System) data downloader and filtering utilities.
This module provides functionality to download, filter, and process AIS data from Hugging Face datasets based on date ranges, time windows, and Areas of Interest (AOI).
Attributes
Classes
Handler for downloading and filtering AIS data from Hugging Face datasets. |
Functions
|
Convenience function to download and filter AIS data. |
Module Contents
- class phidown.ais.AISDataHandler(hf_repo_id: str = 'Lore0123/AISPortal', file_template: str = '{date}_ais.parquet', date_format: str = '%Y-%m-%d', verbose: bool = True)[source]
Handler for downloading and filtering AIS data from Hugging Face datasets.
This class provides methods to download AIS data for specified date ranges, filter by time windows, and apply spatial filtering using Areas of Interest (AOI).
- get_ais_data(start_date: str | datetime.date, end_date: str | datetime.date | None = None, start_time: str | datetime.time | None = None, end_time: str | datetime.time | None = None, aoi_wkt: str | None = None, verbose: bool | None = None) pandas.DataFrame[source]
Download and filter AIS data based on date range, time window, and AOI.
- Parameters:
start_date – Start date for data retrieval (YYYY-MM-DD string or date object).
end_date – End date for data retrieval. If None, uses start_date.
start_time – Start time for daily filtering (HH:MM:SS string or time object).
end_time – End time for daily filtering (HH:MM:SS string or time object).
aoi_wkt – Area of Interest as WKT polygon string for spatial filtering.
verbose – Whether to print progress messages. If None, uses instance default.
- Returns:
name: Vessel name
lat: Latitude
lon: Longitude
source_date: Date of data source
timestamp: Timestamp in YYYY-MM-DD HH:MM:SS format
mmsi: Maritime Mobile Service Identity
Plus all additional columns from the original AIS dataset
(COG, SOG, HEADING, NAVSTAT, IMO, CALLSIGN, TYPE, etc.)
- Return type:
Filtered pandas DataFrame containing AIS data with standardized columns
- Raises:
ValueError – If date parsing fails or no valid data is found.
- get_errors() List[str][source]
Get list of errors encountered during data processing.
- Returns:
List of error messages from the last data retrieval operation.
- _parse_date(value: str | datetime.date | None) datetime.date | None[source]
Parse various date formats into date object.
- Parameters:
value – Date as string, date object, or None.
- Returns:
Parsed date object or None if parsing fails.
- _parse_time(value: str | datetime.time | None) datetime.time | None[source]
Parse time string into time object.
- Parameters:
value – Time as string, time object, or None.
- Returns:
Parsed time object or None if parsing fails.
- _iterate_dates(start: datetime.date, end: datetime.date) List[datetime.date][source]
Generate list of dates between start and end (inclusive).
- Parameters:
start – Start date.
end – End date.
- Returns:
List of date objects in the range.
- _normalize_column_key(value: str) str[source]
Normalize column name for flexible matching.
- Parameters:
value – Column name to normalize.
- Returns:
Normalized column name (lowercase, alphanumeric only).
- _find_column(df: pandas.DataFrame, candidates: List[str]) str | None[source]
Find column in DataFrame using flexible name matching.
- Parameters:
df – DataFrame to search.
candidates – List of possible column names.
- Returns:
Actual column name if found, None otherwise.
- _build_time_mask(datetimes: pandas.Series, start_time_obj: datetime.time | None, end_time_obj: datetime.time | None) pandas.Series | None[source]
Build boolean mask for time filtering.
- Parameters:
datetimes – Series of datetime values.
start_time_obj – Start time for filtering.
end_time_obj – End time for filtering.
- Returns:
Boolean mask Series or None if no time filtering needed.
- _load_ais_points(dates: List[datetime.date], start_time_obj: datetime.time | None, end_time_obj: datetime.time | None, verbose: bool = True) pandas.DataFrame[source]
Load AIS data for multiple dates and apply time filtering.
- Parameters:
dates – List of dates to load data for.
start_time_obj – Start time for daily filtering.
end_time_obj – End time for daily filtering.
verbose – Whether to print progress messages.
- Returns:
Concatenated and filtered DataFrame.
- _filter_by_aoi(df: pandas.DataFrame, wkt_text: str, verbose: bool = True) pandas.DataFrame[source]
Filter DataFrame points by Area of Interest polygon.
- Parameters:
df – DataFrame with lat/lon columns.
wkt_text – WKT polygon string defining the AOI.
verbose – Whether to print progress messages.
- Returns:
Filtered DataFrame containing only points within the AOI.
- Raises:
ValueError – If shapely is not available or WKT parsing fails.
- phidown.ais.download_ais_data(start_date: str | datetime.date, end_date: str | datetime.date | None = None, start_time: str | datetime.time | None = None, end_time: str | datetime.time | None = None, aoi_wkt: str | None = None, hf_repo_id: str = 'Lore0123/AISPortal', verbose: bool = True) pandas.DataFrame[source]
Convenience function to download and filter AIS data.
- Parameters:
start_date – Start date for data retrieval (YYYY-MM-DD string or date object).
end_date – End date for data retrieval. If None, uses start_date.
start_time – Start time for daily filtering (HH:MM:SS string or time object).
end_time – End time for daily filtering (HH:MM:SS string or time object).
aoi_wkt – Area of Interest as WKT polygon string for spatial filtering.
hf_repo_id – Hugging Face repository ID containing AIS data.
verbose – Whether to print progress and error messages.
- Returns:
Filtered pandas DataFrame containing AIS data with all available columns. Standardized columns (name, lat, lon, source_date, timestamp, mmsi) are placed first, followed by all original AIS dataset columns.
Example
>>> # Download data for a single day >>> df = download_ais_data("2025-08-25")
>>> # Download with time window (silent) >>> df = download_ais_data( ... "2025-08-25", ... start_time="10:00:00", ... end_time="12:00:00", ... verbose=False ... )
>>> # Download with AOI filtering >>> aoi = "POLYGON((4.21 51.37,4.48 51.37,4.51 51.29,4.47 51.17,4.25 51.17,4.19 51.25,4.21 51.37))" >>> df = download_ais_data("2025-08-25", aoi_wkt=aoi)