pycsa.core.tile_cache

Topography tile caching system for efficient parallel processing.

This module provides a caching layer for MERIT/ETOPO topography tiles to avoid repeatedly opening/closing NetCDF files during parallel cell processing.

Functions

close_worker_cache()

Close NetCDF handles and drop the worker cache.

compute_split_EW(lon_verts)

Determine whether a cell's longitude extent truly crosses the dateline.

create_tile_cache_from_grid(grid, params[, ...])

Create a tile cache containing all tiles needed for a given grid.

get_worker_cache()

Return this worker's tile cache; raise if init_worker_cache wasn't called.

init_worker_cache(data_dir[, dataset_type])

Initialize a lazy tile cache in the current worker process.

Classes

TopographyTileCache(data_dir, tile_filenames)

Cache for topography data tiles.

pycsa.core.tile_cache.compute_split_EW(lon_verts: ndarray) bool

Determine whether a cell’s longitude extent truly crosses the dateline.

Uses the robust span-comparison formula: a true crossing occurs only when converting to the [0, 360) representation reduces the span AND the original span exceeds 180°. This avoids the false positives that plagued cells in the western hemisphere near the dateline (e.g. Aleutian cells).

Parameters:

lon_verts (array-like) – Cell longitude vertices (1-D), in [-180, 180).

Returns:

True if the cell truly crosses the dateline, False otherwise.

Return type:

bool

class pycsa.core.tile_cache.TopographyTileCache(data_dir: str, tile_filenames: List[str], dataset_type: str = 'MERIT', verbose: bool = False)

Cache for topography data tiles.

Pre-loads all required MERIT/ETOPO/REMA tiles into memory and provides fast access to subsets for individual grid cells.

This dramatically speeds up parallel processing by avoiding repeated file I/O operations.

Parameters:
  • data_dir (str or Path) – Base directory containing topography data tiles

  • tile_filenames (list of str) – List of tile filenames to pre-load

  • dataset_type (str, optional) – Type of dataset (‘MERIT’, ‘ETOPO’, ‘REMA’), by default ‘MERIT’

  • verbose (bool, optional) – Enable verbose logging, by default False

tiles

Dictionary mapping filenames to opened netCDF4.Dataset objects

Type:

dict

tile_bounds

Dictionary mapping filenames to (lat_min, lat_max, lon_min, lon_max) bounds

Type:

dict

__init__(data_dir: str, tile_filenames: List[str], dataset_type: str = 'MERIT', verbose: bool = False)
get_data_for_region(lat_extent: ndarray, lon_extent: ndarray, merit_cg: int = 1) Tuple[ndarray, ndarray, ndarray]

Extract topography data for a given lat/lon region.

This is designed to be a drop-in replacement for the current read_merit_topo().get_topo() workflow.

Parameters:
  • lat_extent (array-like) – Latitude extent [lat_min, lat_max, …]

  • lon_extent (array-like) – Longitude extent [lon_min, lon_max, …]

  • merit_cg (int, optional) – Coarse-graining factor, by default 1

Returns:

  • lat (ndarray) – Latitude coordinates. When merit_cg > 1 these are the windowed means of the sorted source coordinates.

  • lon (ndarray) – Longitude coordinates. When the cell crosses the dateline the extent is shifted into [0, 360); when merit_cg > 1 these are the windowed means of the sorted source coordinates.

  • topo (ndarray) – Topography data (2D array).

Notes

For high-southern-latitude cells (lat_max < -85.0) the effective coarse-graining stride is iint = merit_cg * 5 (a 5× multiplier) to compensate for the convergence of meridians near the pole.

get_etopo_data(lat_extent: ndarray, lon_extent: ndarray, etopo_cg: int = 1) Tuple[ndarray, ndarray, ndarray]

Load ETOPO topography for a cell’s lat/lon vertex extent.

Byte-equivalent to pycsa.core.io.read_etopo_topo.get_topo + __load_topo, but uses this cache’s persistent file handles so the same tile isn’t re-opened across cells within a worker.

Parameters:
  • lat_extent (array-like) – Cell latitude vertices (1-D).

  • lon_extent (array-like) – Cell longitude vertices (1-D), in [-180, 180).

  • etopo_cg (int, optional) – Coarse-graining factor (stride).

Returns:

1-D coordinate arrays and the 2-D topography slab, sorted in ascending lat/lon. lon is in [0, 360) when the cell crosses the dateline; otherwise it stays in [-180, 180).

Return type:

lat, lon, topo

close_all()

Close all opened NetCDF files.

pycsa.core.tile_cache.create_tile_cache_from_grid(grid, params, padding: float = 0.5) TopographyTileCache

Create a tile cache containing all tiles needed for a given grid.

This analyzes the grid to determine which tiles are needed, then pre-loads them all at once.

Parameters:
  • grid (pycsa.core.var.grid) – ICON grid object with cell vertices

  • params (pycsa.core.var.params) – Parameters object with path_merit, path_etopo, etc.

  • padding (float, optional) – Extra padding in degrees to ensure tiles are loaded, by default 0.5

Returns:

Initialized cache with all required tiles loaded

Return type:

TopographyTileCache

pycsa.core.tile_cache.init_worker_cache(data_dir: str, dataset_type: str = 'ETOPO') bool

Initialize a lazy tile cache in the current worker process.

Intended to be called via client.run(init_worker_cache, path_etopo) at the start of each memory batch. Idempotent: a second call with the same arguments is a no-op so reinitialisation across batches is cheap.

Returns True so client.run reports {worker_addr: True, …} on success.

pycsa.core.tile_cache.get_worker_cache() TopographyTileCache

Return this worker’s tile cache; raise if init_worker_cache wasn’t called.

pycsa.core.tile_cache.close_worker_cache() bool

Close NetCDF handles and drop the worker cache. Returns True.