Skip to content

multiscale

multiscale #

multiscale(array, reduction, scale_factors, preserve_dtype=True, chunks='preserve', chained=True, namer=_default_namer, **kwargs)

Generate a coordinate-aware multiscale representation of an array.

Parameters:

Name Type Description Default
array Array-like, e.g. Numpy array, Dask array

The array to be downscaled.

required
reduction callable

A function that aggregates chunks of data over windows. See xarray_multiscale.reducers.WindowedReducer for the expected signature of this callable.

required
scale_factors int or sequence of ints

The desired downscaling factors, one for each axis, or a single value for all axes.

required
preserve_dtype bool

If True, output arrays are all cast to the same data type as the input array. If False, output arrays will have data type determined by the output of the reduction function.

True
chunks sequence or dict of ints, or the string "preserve" (default)

Set the chunking of the output arrays. Applies only to dask arrays. If chunks is set to "preserve" (the default), then chunk sizes will decrease with each level of downsampling. Otherwise, this argument is passed to xarray_multiscale.chunks.normalize_chunks.

Otherwise, this keyword argument will be passed to the xarray.DataArray.chunk method for each output array, producing a list of arrays with the same chunk size. Note that rechunking can be computationally expensive for arrays with many chunks.

'preserve'
chained bool

If True (default), the nth downscaled array is generated by applying the reduction function on the n-1th downscaled array with the user-supplied scale_factors. This means that the nth downscaled array directly depends on the n-1th downscaled array. Note that nonlinear reductions like the windowed mode may give inaccurate results with chained set to True.

If False, the nth downscaled array is generated by applying the reduction function on the 0th downscaled array (i.e., the input array) with the scale_factors raised to the nth power. This means that the nth downscaled array directly depends on the input array.

True
namer callable, defaults to `_default_namer`

A function for naming the output arrays. This function should take an integer index and return a string. The default function simply prepends the string representation of the integer with the character "s".

_default_namer
**kwargs Any

Additional keyword arguments that will be passed to the reduction function.

{}

Returns:

Name Type Description
result list[DataArray]

The first element of this list is the input array, converted to an xarray.DataArray. Each subsequent element of the list is the result of downsampling the previous element of the list.

The coords attributes of these DataArrays track the changing offset and scale induced by the downsampling operation.

Examples:

>>> import numpy as np
>>> from xarray_multiscale import multiscale
>>> from xarray_multiscale.reducers import windowed_mean
>>> multiscale(np.arange(4), windowed_mean, 2)
[<xarray.DataArray (dim_0: 4)>
array([0, 1, 2, 3])
Coordinates:
  * dim_0    (dim_0) float64 0.0 1.0 2.0 3.0, <xarray.DataArray (dim_0: 2)>
array([0, 2])
Coordinates:
  * dim_0    (dim_0) float64 0.5 2.5]
Source code in src/xarray_multiscale/multiscale.py
def multiscale(
    array: npt.NDArray[Any],
    reduction: WindowedReducer,
    scale_factors: Sequence[int] | int,
    preserve_dtype: bool = True,
    chunks: ChunkOption | Sequence[int] | dict[Hashable, int] = "preserve",
    chained: bool = True,
    namer: Callable[[int], str] = _default_namer,
    **kwargs: Any,
) -> list[xarray.DataArray]:
    """
    Generate a coordinate-aware multiscale representation of an array.

    Parameters
    ----------
    array : Array-like, e.g. Numpy array, Dask array
        The array to be downscaled.

    reduction : callable
        A function that aggregates chunks of data over windows.
        See `xarray_multiscale.reducers.WindowedReducer` for the expected
        signature of this callable.

    scale_factors : int or sequence of ints
        The desired downscaling factors, one for each axis, or a single
        value for all axes.

    preserve_dtype : bool, default=True
        If True, output arrays are all cast to the same data type as the
        input array. If False, output arrays will have data type determined
        by the output of the reduction function.

    chunks : sequence or dict of ints, or the string "preserve" (default)
        Set the chunking of the output arrays. Applies only to dask arrays.
        If `chunks` is set to "preserve" (the default), then chunk sizes will
        decrease with each level of downsampling. Otherwise, this argument is
        passed to `xarray_multiscale.chunks.normalize_chunks`.

        Otherwise, this keyword argument will be passed to the
        `xarray.DataArray.chunk` method for each output array,
        producing a list of arrays with the same chunk size.
        Note that rechunking can be computationally expensive
        for arrays with many chunks.

    chained : bool, default=True
        If True (default), the nth downscaled array is generated by
        applying the reduction function on the n-1th downscaled array with
        the user-supplied `scale_factors`. This means that the nth
        downscaled array directly depends on the n-1th downscaled array.
        Note that nonlinear reductions like the windowed mode may give
        inaccurate results with `chained` set to True.

        If False, the nth downscaled array is generated by applying the
        reduction function on the 0th downscaled array
        (i.e., the input array) with the `scale_factors` raised to the nth
        power. This means that the nth downscaled array directly depends
        on the input array.

    namer : callable, defaults to `_default_namer`
        A function for naming the output arrays. This function should take an integer
        index and return a string. The default function simply prepends the string
        representation of the integer with the character "s".

    **kwargs: Any
        Additional keyword arguments that will be passed to the reduction function.

    Returns
    -------
    result : list[xarray.DataArray]
        The first element of this list is the input array, converted to an
        `xarray.DataArray`. Each subsequent element of the list is
        the result of downsampling the previous element of the list.

        The `coords` attributes of these DataArrays track the changing
        offset and scale induced by the downsampling operation.

    Examples
    --------
    >>> import numpy as np
    >>> from xarray_multiscale import multiscale
    >>> from xarray_multiscale.reducers import windowed_mean
    >>> multiscale(np.arange(4), windowed_mean, 2)
    [<xarray.DataArray (dim_0: 4)>
    array([0, 1, 2, 3])
    Coordinates:
      * dim_0    (dim_0) float64 0.0 1.0 2.0 3.0, <xarray.DataArray (dim_0: 2)>
    array([0, 2])
    Coordinates:
      * dim_0    (dim_0) float64 0.5 2.5]
    """
    scale_factors = broadcast_to_rank(scale_factors, array.ndim)
    darray = to_dataarray(array, name=namer(0))

    levels = range(1, downsampling_depth(darray.shape, scale_factors))

    result: list[xarray.DataArray] = [darray]
    for level in levels:
        if chained:
            scale = scale_factors
            source = result[-1]
        else:
            scale = tuple(s**level for s in scale_factors)
            source = result[0]
        downscaled = downscale(source, reduction, scale, preserve_dtype, **kwargs)
        downscaled.name = namer(level)
        result.append(downscaled)

    if darray.chunks is not None and chunks != "preserve":
        new_chunks = [normalize_chunks(r, chunks) for r in result]
        result = [r.chunk(ch) for r, ch in zip(result, new_chunks)]

    return result

to_dataarray #

to_dataarray(array, name=None)

Convert the input to an xarray.DataArray if it is not already one.

Source code in src/xarray_multiscale/multiscale.py
def to_dataarray(array: Any, name: str | None = None) -> xarray.DataArray:
    """
    Convert the input to an `xarray.DataArray` if it is not already one.
    """
    if isinstance(array, xarray.DataArray):
        data = array.data
        dims = array.dims
        # ensure that key order matches dimension order
        coords = {d: array.coords[d] for d in dims}
        attrs = array.attrs
    else:
        data = array
        dims = tuple(f"dim_{d}" for d in range(data.ndim))
        coords = {
            dim: xarray.DataArray(np.arange(shape, dtype="float"), dims=dim)
            for dim, shape in zip(dims, array.shape)
        }
        attrs = {}

    result = xarray.DataArray(data=data, coords=coords, dims=dims, attrs=attrs, name=name)
    return result

downscale_dask #

downscale_dask(array, reduction, scale_factors, **kwargs)

Downscale a dask array.

Source code in src/xarray_multiscale/multiscale.py
def downscale_dask(
    array: Any,
    reduction: WindowedReducer,
    scale_factors: Sequence[int],
    **kwargs: Any,
) -> Any:
    """
    Downscale a dask array.
    """
    if not np.all((np.array(array.shape) % np.array(scale_factors)) == 0):
        msg = f"Coarsening factors {scale_factors} do not align with array shape {array.shape}."
        raise ValueError(msg)

    array = align_chunks(array, scale_factors)
    name: str = "downscale-" + tokenize(reduction, array, scale_factors)
    dsk = {
        (name,) + key[1:]: (apply, reduction, [key, scale_factors], kwargs)
        for key in flatten(array.__dask_keys__())
    }
    chunks = tuple(
        tuple(int(size // scale_factors[axis]) for size in sizes)
        for axis, sizes in enumerate(array.chunks)
    )

    meta = reduction(np.empty(scale_factors, dtype=array.dtype), scale_factors, **kwargs)
    graph = HighLevelGraph.from_collections(name, dsk, dependencies=[array])
    return Array(graph, name, chunks, meta=meta)

downscale_coords #

downscale_coords(array, scale_factors)

Downscale coordinates by taking the windowed mean of each coordinate array.

Source code in src/xarray_multiscale/multiscale.py
def downscale_coords(array: xarray.DataArray, scale_factors: Sequence[int]) -> dict[Hashable, Any]:
    """
    Downscale coordinates by taking the windowed mean of each coordinate array.
    """
    new_coords = {}
    for (
        coord_name,
        coord,
    ) in array.coords.items():
        coarsening_dims = {
            d: scale_factors[idx] for idx, d in enumerate(array.dims) if d in coord.dims
        }
        new_coords[coord_name] = coord.coarsen(coarsening_dims).mean()
    return new_coords

downsampling_depth #

downsampling_depth(shape, scale_factors)

For a shape and a sequence of scale factors, calculate the number of downsampling operations that must be performed to produce a downsampled shape with at least one singleton value.

If any element of scale_factors is greater than the corresponding shape, this function returns 0.

If all scale_factors are 1, this function returns 0.

Parameters:

Name Type Description Default
shape Sequence[int]

An array shape.

required
scale_factors Sequence[int]

Downsampling factors.

required

Examples:

>>> downsampling_depth((8,), (2,))
3
>>> downsampling_depth((8,2), (2,2))
1
>>> downsampling_depth((7,), (2,))
2
Source code in src/xarray_multiscale/multiscale.py
def downsampling_depth(shape: Sequence[int], scale_factors: Sequence[int]) -> int:
    """
    For a shape and a sequence of scale factors, calculate the
    number of downsampling operations that must be performed to produce
    a downsampled shape with at least one singleton value.

    If any element of `scale_factors` is greater than the
    corresponding shape, this function returns 0.

    If all `scale_factors` are 1, this function returns 0.

    Parameters
    ----------
    shape: Sequence[int]
        An array shape.

    scale_factors : Sequence[int]
        Downsampling factors.

    Examples
    --------
    >>> downsampling_depth((8,), (2,))
    3
    >>> downsampling_depth((8,2), (2,2))
    1
    >>> downsampling_depth((7,), (2,))
    2
    """
    if len(shape) != len(scale_factors):
        msg = (
            "The shape and scale_factors parameters do not have the same length."
            f"Shape={shape} has length {len(shape)}, "
            f"but scale_factors={scale_factors} has length {len(scale_factors)}"
        )
        raise ValueError(msg)

    _scale_factors = np.array(scale_factors).astype("int")
    _shape = np.array(shape).astype("int")
    valid = _scale_factors > 1
    if not valid.any():
        result = 0
    else:
        depths = np.floor(logn(_shape[valid], _scale_factors[valid]))
        result = min(depths.astype("int"))
    return result