
class InputMixIn(target, shape, raster_file=None, raster_index=None, temporal_slice=slice(None, None, 1), res_kwargs=None)[source]

Bases: CacheHandlingMixIn

MixIn class with properties and methods for handling the spatiotemporal data domain to extract from source data.

Provide properties of the spatiotemporal data domain

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None and raster_index is not provided raster_index will be calculated directly. Either need target+shape, raster_file, or raster_index input.

  • raster_index (list) – List of tuples or slices. Used as an alternative to computing the raster index from target+shape or loading the raster index from file

  • temporal_slice (slice) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, time_pruning). If equal to slice(None, None, 1) the full time dimension is selected.

  • res_kwargs (dict | None) – Dictionary of kwargs to pass to xarray.open_mfdataset.



Cache files for storing extracted data


Get correct cache file pattern for formatting.


List of features which have been requested but have been determined not to need extraction.


Get file paths for input data


Get the full lat/lon grid without doing any latitude inversion


Get shape of raster


Method to provide info about files in log output.


Whether to invert the latitude axis during data extraction.


Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.


Flattened list of latitudes


Flattened list of longitudes


Meta dataframe with coordinates.


Check whether we need to get the full lat/lon grid to determine target and shape values


Get list of features needing extraction or derivation


Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.


Time index for input data without time pruning.


Get number of time steps for all input files


Check if there is a file for each time step, in which case we can send a subset of files to the data handler according to ti_pad_slice


Get data type for source files.


Get lower left corner of raster


Get temporal range to extract from full dataset


Get max number of workers for computing time index


Get the time frequency in hours as a float


Time index for input data with time pruning.


Get time index file path


Check if we should try to load cache

property raw_tsteps

Get number of time steps for all input files

property single_ts_files

Check if there is a file for each time step, in which case we can send a subset of files to the data handler according to ti_pad_slice

static get_capped_workers(max_workers_cap, max_workers)[source]

Get max number of workers for a given job. Capped to global max workers if specified

  • max_workers_cap (int | None) – Cap for job specific max_workers

  • max_workers (int | None) – Job specific max_workers


max_workers (int | None) – job specific max_workers capped by max_workers_cap if provided


Cap all workers args by max_workers

abstract classmethod get_full_domain(file_paths)[source]

Get full lat/lon grid for when target + shape are not specified

abstract classmethod get_lat_lon(file_paths, raster_index, invert_lat=False)[source]

Get lat/lon grid for requested target and shape

abstract get_time_index(file_paths, max_workers=None, **kwargs)[source]

Get raw time index for source data

property input_file_info

Method to provide info about files in log output. Since NETCDF files have single time slices printing out all the file paths is just a text dump without much info.


str – message to append to log output that does not include a huge info dump of file paths

property temporal_slice

Get temporal range to extract from full dataset

property file_paths

Get file paths for input data

property ti_workers

Get max number of workers for computing time index

property need_full_domain

Check whether we need to get the full lat/lon grid to determine target and shape values

property full_raw_lat_lon

Get the full lat/lon grid without doing any latitude inversion

property raw_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This returns the gid without any lat inversion.



property latitude

Flattened list of latitudes

property longitude

Flattened list of longitudes

property meta

Meta dataframe with coordinates.

property lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This ensures that the lower left hand corner of the domain is given by lat_lon[-1, 0]



property invert_lat

Whether to invert the latitude axis during data extraction. This is to enforce a descending latitude ordering so that the lower left corner of the grid is at idx=(-1, 0) instead of idx=(0, 0)

property target

Get lower left corner of raster


_target (tuple) – (lat, lon) lower left corner of raster.

classmethod lats_are_descending(lat_lon)[source]

Check if latitudes are in descending order (i.e. the target coordinate is already at the bottom left corner)


lat_lon (np.ndarray) – Lat/Lon array with shape (n_lats, n_lons, 2)



property grid_shape

Get shape of raster


_grid_shape (tuple) – (rows, cols) grid size.

property source_type

Get data type for source files. Either nc or h5

property raw_time_index

Time index for input data without time pruning. This is the base time index for the raw input data.


Check if the number of input files and the length of the time index is the same

property time_index

Time index for input data with time pruning. This is the raw time index with a cropped range and time step applied.

property time_freq_hours

Get the time frequency in hours as a float

property time_index_file

Get time index file path

property cache_files

Cache files for storing extracted data

property cache_pattern

Get correct cache file pattern for formatting.


_cache_pattern (str) – The cache file pattern with formatting keys included.

property cached_features

List of features which have been requested but have been determined not to need extraction. Thus they have been cached already.

static check_cached_features(features, cache_files=None, overwrite_cache=False, load_cached=False)

Check which features have been cached and check flags to determine whether to load or extract this features again

  • features (list) – list of features to extract

  • cache_files (list | None) – Path to files with saved feature data

  • overwrite_cache (bool) – Whether to overwrite cached files

  • load_cached (bool) – Whether to load data from cache files


list – List of features to extract. Might not include features which have cache files.

get_cache_file_names(cache_pattern, grid_shape=None, time_index=None, target=None, features=None)

Get names of cache files from cache_pattern and feature names

  • cache_pattern (str) – Pattern to use for cache file names

  • grid_shape (tuple) – Shape of grid to use for cache file naming

  • time_index (list | pd.DatetimeIndex) – Time index to use for cache file naming

  • target (tuple) – Target to use for cache file naming

  • features (list) – List of features to use for cache file naming


list – List of cache file names

property noncached_features

Get list of features needing extraction or derivation

parallel_load(data, cache_files, features, max_workers=None)

Load feature data in parallel

  • data (ndarray) – Array to fill with cached data

  • cache_files (list) – List of cache files for each feature

  • features (list) – List of requested features

  • max_workers (int | None) – Max number of workers to use for parallel data loading. If None the max number of available workers will be used.

property try_load

Check if we should try to load cache