phidown.ais

AIS (Automatic Identification System) data downloader and filtering utilities.

This module provides functionality to download, filter, and process AIS data from Hugging Face datasets based on date ranges, time windows, and Areas of Interest (AOI).

Attributes

SHAPELY_AVAILABLE

Classes

AISDataHandler

Handler for downloading and filtering AIS data from Hugging Face datasets.

Functions

download_ais_data(→ pandas.DataFrame)

Convenience function to download and filter AIS data.

Module Contents

phidown.ais.SHAPELY_AVAILABLE = True[source]
class phidown.ais.AISDataHandler(hf_repo_id: str = 'Lore0123/AISPortal', file_template: str = '{date}_ais.parquet', date_format: str = '%Y-%m-%d', verbose: bool = True)[source]

Handler for downloading and filtering AIS data from Hugging Face datasets.

This class provides methods to download AIS data for specified date ranges, filter by time windows, and apply spatial filtering using Areas of Interest (AOI).

hf_repo_id[source]

Hugging Face repository ID for AIS data.

Type:

str

file_template[source]

Template for AIS data filenames.

Type:

str

date_format[source]

Date format string.

Type:

str

verbose[source]

Whether to print progress messages.

Type:

bool

hf_repo_id = 'Lore0123/AISPortal'[source]
file_template = '{date}_ais.parquet'[source]
date_format = '%Y-%m-%d'[source]
verbose = True[source]
_errors: List[str] = [][source]
get_ais_data(start_date: str | datetime.date, end_date: str | datetime.date | None = None, start_time: str | datetime.time | None = None, end_time: str | datetime.time | None = None, aoi_wkt: str | None = None, verbose: bool | None = None) pandas.DataFrame[source]

Download and filter AIS data based on date range, time window, and AOI.

Parameters:
  • start_date – Start date for data retrieval (YYYY-MM-DD string or date object).

  • end_date – End date for data retrieval. If None, uses start_date.

  • start_time – Start time for daily filtering (HH:MM:SS string or time object).

  • end_time – End time for daily filtering (HH:MM:SS string or time object).

  • aoi_wkt – Area of Interest as WKT polygon string for spatial filtering.

  • verbose – Whether to print progress messages. If None, uses instance default.

Returns:

  • name: Vessel name

  • lat: Latitude

  • lon: Longitude

  • source_date: Date of data source

  • timestamp: Timestamp in YYYY-MM-DD HH:MM:SS format

  • mmsi: Maritime Mobile Service Identity

  • Plus all additional columns from the original AIS dataset

(COG, SOG, HEADING, NAVSTAT, IMO, CALLSIGN, TYPE, etc.)

Return type:

Filtered pandas DataFrame containing AIS data with standardized columns

Raises:

ValueError – If date parsing fails or no valid data is found.

get_errors() List[str][source]

Get list of errors encountered during data processing.

Returns:

List of error messages from the last data retrieval operation.

_parse_date(value: str | datetime.date | None) datetime.date | None[source]

Parse various date formats into date object.

Parameters:

value – Date as string, date object, or None.

Returns:

Parsed date object or None if parsing fails.

_parse_time(value: str | datetime.time | None) datetime.time | None[source]

Parse time string into time object.

Parameters:

value – Time as string, time object, or None.

Returns:

Parsed time object or None if parsing fails.

_iterate_dates(start: datetime.date, end: datetime.date) List[datetime.date][source]

Generate list of dates between start and end (inclusive).

Parameters:
  • start – Start date.

  • end – End date.

Returns:

List of date objects in the range.

_normalize_column_key(value: str) str[source]

Normalize column name for flexible matching.

Parameters:

value – Column name to normalize.

Returns:

Normalized column name (lowercase, alphanumeric only).

_find_column(df: pandas.DataFrame, candidates: List[str]) str | None[source]

Find column in DataFrame using flexible name matching.

Parameters:
  • df – DataFrame to search.

  • candidates – List of possible column names.

Returns:

Actual column name if found, None otherwise.

_build_time_mask(datetimes: pandas.Series, start_time_obj: datetime.time | None, end_time_obj: datetime.time | None) pandas.Series | None[source]

Build boolean mask for time filtering.

Parameters:
  • datetimes – Series of datetime values.

  • start_time_obj – Start time for filtering.

  • end_time_obj – End time for filtering.

Returns:

Boolean mask Series or None if no time filtering needed.

_load_ais_points(dates: List[datetime.date], start_time_obj: datetime.time | None, end_time_obj: datetime.time | None, verbose: bool = True) pandas.DataFrame[source]

Load AIS data for multiple dates and apply time filtering.

Parameters:
  • dates – List of dates to load data for.

  • start_time_obj – Start time for daily filtering.

  • end_time_obj – End time for daily filtering.

  • verbose – Whether to print progress messages.

Returns:

Concatenated and filtered DataFrame.

_filter_by_aoi(df: pandas.DataFrame, wkt_text: str, verbose: bool = True) pandas.DataFrame[source]

Filter DataFrame points by Area of Interest polygon.

Parameters:
  • df – DataFrame with lat/lon columns.

  • wkt_text – WKT polygon string defining the AOI.

  • verbose – Whether to print progress messages.

Returns:

Filtered DataFrame containing only points within the AOI.

Raises:

ValueError – If shapely is not available or WKT parsing fails.

phidown.ais.download_ais_data(start_date: str | datetime.date, end_date: str | datetime.date | None = None, start_time: str | datetime.time | None = None, end_time: str | datetime.time | None = None, aoi_wkt: str | None = None, hf_repo_id: str = 'Lore0123/AISPortal', verbose: bool = True) pandas.DataFrame[source]

Convenience function to download and filter AIS data.

Parameters:
  • start_date – Start date for data retrieval (YYYY-MM-DD string or date object).

  • end_date – End date for data retrieval. If None, uses start_date.

  • start_time – Start time for daily filtering (HH:MM:SS string or time object).

  • end_time – End time for daily filtering (HH:MM:SS string or time object).

  • aoi_wkt – Area of Interest as WKT polygon string for spatial filtering.

  • hf_repo_id – Hugging Face repository ID containing AIS data.

  • verbose – Whether to print progress and error messages.

Returns:

Filtered pandas DataFrame containing AIS data with all available columns. Standardized columns (name, lat, lon, source_date, timestamp, mmsi) are placed first, followed by all original AIS dataset columns.

Example

>>> # Download data for a single day
>>> df = download_ais_data("2025-08-25")
>>> # Download with time window (silent)
>>> df = download_ais_data(
...     "2025-08-25",
...     start_time="10:00:00",
...     end_time="12:00:00",
...     verbose=False
... )
>>> # Download with AOI filtering
>>> aoi = "POLYGON((4.21 51.37,4.48 51.37,4.51 51.29,4.47 51.17,4.25 51.17,4.19 51.25,4.21 51.37))"
>>> df = download_ais_data("2025-08-25", aoi_wkt=aoi)