phidown.search

Classes

CopernicusDataSearcher

Module Contents

class phidown.search.CopernicusDataSearcher[source]
base_url: str = 'https://catalogue.dataspace.copernicus.eu/odata/v1/Products'[source]
config: dict | None[source]
collection_name: str | None = None[source]
product_type: str | None = None[source]
orbit_direction: str | None = None[source]
cloud_cover_threshold: float | None = None[source]
attributes: Dict[str, str | int | float] | None = None[source]
aoi_wkt: str | None = None[source]
start_date: str | None = None[source]
end_date: str | None = None[source]
burst_mode: bool = False[source]
burst_id: int | None = None[source]
absolute_burst_id: int | None = None[source]
swath_identifier: str | None = None[source]
parent_product_name: str | None = None[source]
parent_product_type: str | None = None[source]
parent_product_id: str | None = None[source]
datatake_id: int | None = None[source]
relative_orbit_number: int | None = None[source]
operational_mode: str | None = None[source]
polarisation_channels: str | None = None[source]
platform_serial_identifier: str | None = None[source]
top: int = 1000[source]
count: bool = False[source]
order_by: str = 'ContentDate/Start desc'[source]
query_by_filter(base_url: str = 'https://catalogue.dataspace.copernicus.eu/odata/v1/Products', collection_name: str | None = 'SENTINEL-1', product_type: str | None = None, orbit_direction: str | None = None, cloud_cover_threshold: float | None = None, attributes: Dict[str, str | int | float] | None = None, aoi_wkt: str | None = None, start_date: str | None = None, end_date: str | None = None, top: int = 1000, count: bool = False, order_by: str = 'ContentDate/Start desc', burst_mode: bool = False, burst_id: int | None = None, absolute_burst_id: int | None = None, swath_identifier: str | None = None, parent_product_name: str | None = None, parent_product_type: str | None = None, parent_product_id: str | None = None, datatake_id: int | None = None, relative_orbit_number: int | None = None, operational_mode: str | None = None, polarisation_channels: str | None = None, platform_serial_identifier: str | None = None) None[source]

Set and validate search parameters for the Copernicus data query.

Parameters:
  • base_url (str) – The base URL for the OData API.

  • collection_name (str, optional) – Name of the collection to search. Defaults to ‘SENTINEL-1’.

  • product_type (str, optional) – Type of product to filter. Defaults to None.

  • orbit_direction (str, optional) – Orbit direction to filter (e.g., ‘ASCENDING’, ‘DESCENDING’). Defaults to None.

  • cloud_cover_threshold (float, optional) – Maximum cloud cover percentage to filter. Defaults to None.

  • attributes (Dict[str, Union[str, int, float]], optional) – Additional attributes for filtering. Defaults to None.

  • aoi_wkt (str, optional) – Area of Interest in WKT format. Defaults to None.

  • start_date (str, optional) – Start date for filtering (ISO 8601 format). Defaults to None.

  • end_date (str, optional) – End date for filtering (ISO 8601 format). Defaults to None.

  • top (int, optional) – Maximum number of results to retrieve. Defaults to 1000.

  • order_by (str, optional) – Field and direction to order results by. Defaults to “ContentDate/Start desc”.

  • burst_mode (bool, optional) – Enable Sentinel-1 SLC Burst mode searching. Defaults to False.

  • burst_id (int, optional) – Burst ID to filter (burst mode only). Defaults to None.

  • absolute_burst_id (int, optional) – Absolute Burst ID to filter (burst mode only). Defaults to None.

  • swath_identifier (str, optional) – Swath identifier (e.g., ‘IW1’, ‘IW2’) (burst mode only). Defaults to None.

  • parent_product_name (str, optional) – Parent product name (burst mode only). Defaults to None.

  • parent_product_type (str, optional) – Parent product type (burst mode only). Defaults to None.

  • parent_product_id (str, optional) – Parent product ID (burst mode only). Defaults to None.

  • datatake_id (int, optional) – Datatake ID (burst mode only). Defaults to None.

  • relative_orbit_number (int, optional) – Relative orbit number (burst mode only). Defaults to None.

  • operational_mode (str, optional) – Operational mode (e.g., ‘IW’, ‘EW’) (burst mode only). Defaults to None.

  • polarisation_channels (str, optional) – Polarisation channels (e.g., ‘VV’, ‘VH’) (burst mode only). Defaults to None.

  • platform_serial_identifier (str, optional) – Platform serial identifier (e.g., ‘A’, ‘B’) (burst mode only). Defaults to None.

_load_config(config_path=None)[source]

Load the configuration file.

Parameters:

config_path (str, optional) – Path to the configuration file. Defaults to None.

Raises:
_validate_collection(collection_name)[source]

Validate the collection name against the available collections in the configuration.

Parameters:

collection_name (str) – The name of the collection to validate.

Returns:

True if the collection name is valid, False otherwise.

Return type:

bool

_get_valid_product_types(collection_name)[source]

Extracts and filters valid product types from a configuration dictionary based on the given collection name.

Parameters:

collection_name (str) – The name of the collection to filter the product types. (e.g., SENTINEL-1, SENTINEL-2)

Returns:

A list of valid product types for the given collection name.

Return type:

list

_validate_product_type()[source]

Validates the provided product type against a list of valid product types. If the product type is None, the validation is skipped.

Raises:
  • ValueError – If the product type is not in the list of valid product types.

  • TypeError – If the product type is not a string.

_validate_order_by()[source]

Validate the ‘order_by’ parameter against valid fields and directions.

Raises:

ValueError – If the ‘order_by’ parameter is invalid.

_validate_top()[source]

Validate the ‘top’ parameter to ensure it is within the allowed range.

Raises:

ValueError – If the ‘top’ parameter is not between 1 and 1000.

_validate_cloud_cover_threshold()[source]

Validate the ‘cloud_cover_threshold’ parameter to ensure it is between 0 and 100.

Raises:

ValueError – If the ‘cloud_cover_threshold’ parameter is not between 0 and 100.

_validate_orbit_direction()[source]

Validate the ‘orbit_direction’ parameter to ensure it is one of the allowed values.

Raises:

ValueError – If the ‘orbit_direction’ parameter is not ‘ASCENDING’, ‘DESCENDING’, or None.

_validate_aoi_wkt() None[source]

Validate and normalize the ‘aoi_wkt’ parameter to ensure it is a valid WKT polygon. Automatically fixes common issues like extra whitespace and missing closing coordinates.

Raises:
  • ValueError – If the ‘aoi_wkt’ parameter is not a valid WKT polygon.

  • TypeError – If the ‘aoi_wkt’ parameter is not a string.

_validate_time()[source]

Validate the ‘start_date’ and ‘end_date’ parameters to ensure they are in ISO 8601 format and that the start date is earlier than the end date.

Raises:

ValueError – If the dates are not in ISO 8601 format or if the start date is not earlier than the end date.

_validate_attributes()[source]

Validate the ‘attributes’ parameter to ensure it is a dictionary with valid key-value pairs.

Raises:

TypeError – If ‘attributes’ is not a dictionary, or if its keys are not strings, or if its values are not strings, integers, or floats.

_validate_burst_parameters()[source]

Validate burst-specific parameters.

Raises:
  • ValueError – If any burst parameter is invalid.

  • TypeError – If any burst parameter has the wrong type.

_initialize_placeholders()[source]

Initializes placeholder attributes for the class instance.

This method sets up several attributes with default values of None to serve as placeholders. These attributes include:

  • filter_condition (Optional[str]): A string representing a filter condition.

  • query (Optional[str]): A string representing the query.

  • url (Optional[str]): A string representing the URL.

  • response (Optional[requests.Response]): A requests.Response object for HTTP responses.

  • json_data (Optional[dict]): A dictionary to store JSON data from the response.

  • df (Optional[pd.DataFrame]): A pandas DataFrame to store tabular data.

_add_collection_filter(filters)[source]
_add_product_type_filter(filters)[source]
_add_orbit_direction_filter(filters)[source]
_add_cloud_cover_filter(filters)[source]
_add_aoi_filter(filters)[source]
_add_date_filters(filters)[source]
_add_attribute_filters(filters)[source]
_add_burst_filters(filters)[source]

Add burst-specific filters when in burst mode.

_build_filter()[source]

Build the OData filter condition based on the provided parameters.

_build_query()[source]

Build the full OData query URL

execute_query()[source]

Execute the query and retrieve data.

If count=True and the total number of results exceeds the ‘top’ limit, this method will automatically paginate through all results using multiple requests with the $skip parameter, combining all results into a single DataFrame.

Returns:

DataFrame containing all retrieved products.

Return type:

pd.DataFrame

_execute_paginated_query()[source]

Execute paginated queries when results exceed top limit using asyncio

query_by_name(product_name: str) pandas.DataFrame[source]

Query Copernicus data by a specific product name. The results (DataFrame) are stored in self.df.

Parameters:

product_name (str) – The exact name of the product to search for.

Returns:

A DataFrame containing the product details.

Returns an empty DataFrame if the product is not found or an error occurs.

Return type:

pd.DataFrame

Raises:

ValueError – If product_name is empty or not a string.

search_products_by_name_pattern(name_pattern: str, match_type: str, collection_name_filter: str | None = None, top: int | None = None, order_by: str | None = None) pandas.DataFrame[source]

Searches for Copernicus products by a name pattern using ‘exact’, ‘contains’, ‘startswith’, or ‘endswith’. Optionally filters by a specific collection name or uses the instance’s current collection if set. The results (DataFrame) are stored in self.df.

Parameters:
  • name_pattern (str) – The pattern to search for in the product name.

  • match_type (str) – The type of match. Must be one of ‘exact’, ‘contains’, ‘startswith’, ‘endswith’.

  • collection_name_filter (str, optional) – Specific collection to filter this search by. If None, and self.collection_name (instance attribute) is set, self.collection_name will be used. If both are None, no collection filter based on collection name is applied for this specific search.

  • top (int, optional) – Maximum number of results. If None, uses self.top (instance default). Must be between 1 and 1000.

  • order_by (str, optional) – Field and direction to order results (e.g., ‘ContentDate/Start desc’). If None, uses self.order_by (instance default).

Returns:

DataFrame with product details. Empty if no match or error.

Return type:

pd.DataFrame

Raises:

ValueError – If name_pattern is empty, match_type is invalid, or effective ‘top’ is out of range. Also if ‘collection_name_filter’ is provided and is invalid.

display_results(columns=None, top_n=10)[source]

Display the query results with selected columns

download_product(eo_product_name: str, output_dir: str, config_file='.s5cfg', verbose=True, show_progress=True)[source]

Download the EO product using the downloader module.

Parameters:
  • eo_product_name – Name of the EO product to download

  • output_dir – Local output directory for downloaded files

  • config_file – Path to s5cmd configuration file

  • verbose – Whether to print download information

  • show_progress – Whether to show tqdm progress bar during download

Returns:

True if download was successful, False otherwise

Return type:

bool