pycsa.core.validation¶
Held-out validation utilities for the structured prior.
Two callables here. spatial_cv_score() is the workhorse —
it runs k-fold spatial cross-validation for an arbitrary
pycsa.core.priors.Prior at a fixed lmbda, returning
the per-fold and mean held-out MSE. SpatialCVSelector in
hyperparams.py uses it internally over a lmbda grid; users
can call it directly to validate any prior choice without going
through a selector.
Patch geometry, made concrete. Phase 1’s plan flagged this as the most under-specified piece. The implementation here:
Takes per-row coordinates as a
(n_points, 2)array (any metric — local Cartesian, (lon, lat), whatever the caller’s cell uses). WhencoordsisNonewe fall back to row-index (np.arange) ordering — that is, we treat the points as already in scan-line order and split by index. That’s the only setting where the fallback is correct; the caller is responsible for providing real coordinates when the data is on a Delaunay grid or any other non-scan-line layout.Computes a 2D bounding box from the supplied coords, partitions it into a near-square
r × cgrid wherer·c ≥ n_folds, and assigns each fold to one tile. Excess tiles are unused. Tiles are contiguous in coordinate space — this is whatspatialcross-validation actually means; per-point random shuffling leaks long-wavelength modes across folds and would silently overstate held-out accuracy.Each held-out tile has a buffer zone of width
buffer_fraction · tile_sidearound it. Points inside the buffer are excluded from both the training set and the evaluation set for that fold.
Documented limitation: works for cells whose points roughly fill a 2D region (MERIT regional cells, ETOPO regional cells). For ICON Delaunay-triangle cells with sparse coverage near a cell vertex the bounding-box partition may produce empty tiles — the function raises in that case so the failure is visible.
Functions
|
K-fold spatial cross-validation for any |
- pycsa.core.validation.spatial_cv_score(prior: Prior, lmbda: float, design_matrix: ndarray, data: ndarray, *, coords: ndarray | None = None, n_folds: int = 5, buffer_fraction: float = 0.1, rng_seed: int | None = None) dict¶
K-fold spatial cross-validation for any
Prior.Solves the regularized normal equations on each fold’s training rows, predicts the held-out rows, and returns the per-fold and mean reconstruction MSE.
- Parameters:
prior – Any
pycsa.core.priors.Prior. Called per-fold with the fold’s normal-equations matrix.lmbda – Regularization scale passed to the prior.
design_matrix – Dense
Mmatrix, shape(n_points, n_modes).data – Target vector, shape
(n_points,).coords – Per-row 2D coordinates for spatial fold construction. If
None, falls back to a strided index split — only appropriate when rows are already in scan-line order.n_folds – See module docstring.
buffer_fraction – See module docstring.
rng_seed – Seed for the RNG that shuffles the tile assignment order so the chosen folds are spatially spread rather than packed in one corner.
Noneleaves the order unseeded.
- Returns:
per_fold_msendarray of length
n_folds.mean_heldout_mseMean of
per_fold_mse.fold_sizesndarray of shape
(n_folds, 2)— (n_train, n_eval) per fold.
- Return type:
dict with keys
- Raises:
ValueError – If
coordsrows do not matchdesign_matrixrows, or if_build_spatial_foldscannot tile the points (fewer than2 * n_foldspoints, zero extent in an axis, or a fold tile ending up with no eval points / fewer than two train points).