pycsa.core.buffer_pool

Dynamic buffer pool for reusing NumPy arrays across multiple computations.

This module provides memory-efficient buffer management for spectral approximation computations where array sizes may vary between cells (e.g., different amounts of topography data per cell).

Classes

BufferPool()

Dynamic buffer pool that auto-grows to handle variable array sizes.

class pycsa.core.buffer_pool.BufferPool

Dynamic buffer pool that auto-grows to handle variable array sizes.

Strategy: - Keeps the largest buffer seen for each key - Returns views (slices) for smaller requests → zero-copy! - Auto-grows when larger size requested - Tracks usage statistics for performance analysis

This is particularly effective for workflows processing many cells with varying data sizes, as it eliminates repeated memory allocations while adapting to size variations.

Examples

>>> pool = BufferPool()
>>> # First call allocates
>>> arr1 = pool.get_or_create('coeff', (1000, 100), np.float64)
>>> # Second call with same size reuses buffer
>>> arr2 = pool.get_or_create('coeff', (1000, 100), np.float64)
>>> # Smaller size returns a view of existing buffer
>>> arr3 = pool.get_or_create('coeff', (500, 100), np.float64)
>>> # Larger size triggers reallocation
>>> arr4 = pool.get_or_create('coeff', (2000, 100), np.float64)
__init__()

Initialize empty buffer pool.

get_or_create(key, shape, dtype=<class 'numpy.float64'>)

Get buffer from pool, creating or growing as needed.

Parameters:
  • key (str) – Identifier for this buffer (e.g., ‘coeff’, ‘E_tilda_lm’)

  • shape (tuple of int) – Requested shape for the array

  • dtype (numpy dtype, optional) – Data type for the array (default: np.float64)

Returns:

Array of requested shape and dtype. May be a view into a larger buffer.

Return type:

numpy.ndarray

Notes

The returned array should be treated as writable. If you need the data to persist beyond the next call to get_or_create with the same key, make a copy.

clear()

Free all buffers and reset statistics.

Use this when done processing a batch of cells to release memory. In Dask workflows, buffers are automatically released when the worker process terminates, so calling clear() is optional.

get_stats()

Get buffer usage statistics for performance analysis.

Returns:

Dictionary mapping buffer keys to statistics: - ‘hits’: Number of times buffer was reused - ‘misses’: Number of times buffer was allocated - ‘grows’: Number of times buffer was grown

Return type:

dict

Examples

>>> pool = BufferPool()
>>> # ... use pool ...
>>> stats = pool.get_stats()
>>> print(f"Coefficient buffer hit rate: {stats['coeff']['hits'] /
...       (stats['coeff']['hits'] + stats['coeff']['misses']):.1%}")
get_memory_usage()

Get current memory usage of all buffers.

Returns:

Dictionary with: - ‘total_mb’: Total memory used by all buffers in MB - ‘buffers’: Dict mapping keys to individual buffer sizes in MB

Return type:

dict