pycsa.scheduling¶
HPC scheduling helpers: per-cell memory estimation and memory-aware batching.
Pure-numpy functions used by runs/icon_etopo_global.py to size Dask
workers based on each ICON cell’s latitude (polar cells cover more
longitudinal range in degree-space, so they need more topographic data
loaded). Lives in pycsa.* rather than runs/ so tests can import
without going through a run script (or pulling Dask in at collection time).
Functions
|
Estimate memory requirements (in GB) for processing a cell based on its latitude. |
|
Group cells into batches with similar memory requirements. |
- pycsa.scheduling.estimate_cell_memory_gb(lat_deg: float) float¶
Estimate memory requirements (in GB) for processing a cell based on its latitude.
At polar latitudes, cells cover a larger longitudinal range in degree-space, requiring more topographic data points to be loaded with coarse-graining.
- Parameters:
lat_deg (float) – Cell center latitude in degrees (-90 to 90)
- Returns:
Estimated memory requirement in GB
- Return type:
Notes
Equatorial cells (~0°): ~10 GB sufficient
Mid-latitude cells (~45°): ~10 GB
High-latitude cells (~70°): ~25 GB
High-latitude cells (~80°): ~43 GB
Polar cells (~85-89°): ~60 GB required
Memory scales approximately with (1/cos(lat))^0.7 due to meridian convergence, capped at 60 GB at the poles.
Tuning history. Original cap was 60 GB (scale 6.0) but worked only nominally because the planner’s interior path used safety_factor=1.0 (bug; the final-batch path used 1.5). The bug caused 8 workers × 60 GB = 480 GB on the 510 GB node and OOMs. Briefly retuned 60 → 30 → 45 GB chasing the OOM symptom; the real fix was the safety_factor=1.5 update in 2026-05. With safety_factor=1.5 honored consistently, the original 60 GB cap gives 3 workers × 90 GB on a 256 GB budget — well within the per-worker memory_limit that Dask actually enforces.
- pycsa.scheduling.group_cells_by_memory(clat_rad: ndarray, max_memory_per_batch_gb: float = 240.0) list[dict]¶
Group cells into batches with similar memory requirements.
- Parameters:
clat_rad (ndarray) – Cell center latitudes in radians
max_memory_per_batch_gb (float) – Maximum total memory available for a batch (default: 240 GB for 6 workers × 40 GB)
- Returns:
List of batch configurations, each containing: - ‘cell_indices’: list of cell indices in this batch - ‘memory_per_cell_gb’: average memory per cell in GB - ‘n_workers’: recommended number of workers - ‘memory_per_worker_gb’: recommended memory per worker
- Return type: