
class DataHandlerH5WindCC(*args, **kwargs)[source]

Bases: DataHandlerH5

Special data handling and batch sampling for h5 wtk or nsrdb data for climate change applications

  • *args (list) – Same positional args as DataHandlerH5

  • **kwargs (dict) – Same keyword args as DataHandlerH5



Cache feature data to file and delete from memory


Cap all workers args by max_workers

check_cached_features(features[, ...])

Check which features have been cached and check flags to determine whether to load or extract this features again


Check if data is cached and clear data if not load_cached


Free memory used for data arrays

data_fill(shifted_time_chunks[, max_workers])

Fill final data array with extracted / computed chunks

extract_feature(file_paths, raster_index, ...)

Extract single feature from data source

get_cache_file_names(cache_pattern[, ...])

Get names of cache files from cache_pattern and feature names

get_capped_workers(max_workers_cap, max_workers)

Get max number of workers for a given job.

get_closest_lat_lon(lat_lon, target)

Get closest indices to target lat lon


Get target and shape for largest domain possible


Get all available features in input data

get_input_arrays(data, chunk_number, f, ...)

Get only arrays needed for computations

get_inputs_recursive(feature, handle_features)

Lookup inputs needed to compute feature.

get_lat_lon(file_paths, raster_index[, ...])

Get lat/lon grid for requested target and shape

get_lat_lon_df(target[, features])

Get timeseries for given target


Get data for observation using random observation index.


Get a CLI call to initialize DataHandler and cache data.


Randomly gets spatial sample and time sample


Get raster index for file data.

get_raw_feature_list(features, handle_features)

Lookup inputs needed to compute feature

get_time_index(file_paths[, max_workers])

Get time index from data files

has_exact_feature(feature, handle)

Check if exact feature is in handle

has_multilevel_feature(feature, handle)

Check if exact feature is in handle

has_surrounding_features(feature, handle)

Check if handle has feature values at surrounding heights.


Check if latitudes are in descending order (i.e. the target coordinate is already at the bottom left corner).

lin_bc(bc_files[, threshold])

Bias correct the data in this DataHandler using linear bias correction factors from files output by MonthlyLinearCorrection or LinearCorrection from sup3r.bias.bias_calc


Load data from cache files and split into training and validation

lookup(feature, attr_name[, handle_features])

Lookup feature in feature registry


Drop timesteps with NaN data

normalize([means, stds, features, max_workers])

Normalize all data features.

parallel_compute(data, file_paths, ...[, ...])

Compute features using parallel subprocesses

parallel_extract(file_paths, raster_index, ...)

Extract features using parallel subprocesses

parallel_load(data, cache_files, features[, ...])

Load feature data in parallel

pop_old_data(data, chunk_number, all_features)

Remove input feature data if no longer needed for requested features


Run some preflight checks and verify that the inputs are valid

qdm_bc(bc_files, reference_feature[, ...])

Bias Correction using Quantile Delta Mapping

recursive_compute(data, feature, ...)

Compute intermediate features recursively


Build base 4D data array.


Calculate daily average data and store as attribute.


Run the data computation / derivation from raw features to desired features.


Run the raw dataset extraction process from disk to raw un-manipulated datasets.


Run nn nan fill on full data array.

serial_compute(data, file_paths, ...)

Compute features in series


Fill final data array in serial

serial_extract(file_paths, raster_index, ...)

Extract features in series

source_handler(file_paths, **kwargs)

Rex data handler

split_data([data, val_split, shuffle_time])

Split time dimension into set of training indices and validation indices.


Check if the number of input files and the length of the time index is the same

valid_handle_features(features, handle_features)

Check if features are in handle

valid_input_features(features, handle_features)

Check if features are in handle or have compute methods




Get atttributes of input data


Cache files for storing extracted data


Get correct cache file pattern for formatting.


List of features which have been requested but have been determined not to need extraction.


Get upper bound for compute workers based on memory limits.


List of features which need to be derived from other features


Features to extract directly from the source handler


Get upper bound for extract workers based on memory limits.


Number of bytes for a single feature array.


Get file paths for input data


Get the full lat/lon grid without doing any latitude inversion


Get memory used by a feature at a single time step


Get shape of raster


All features available in raw input


Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection.


Get a list of high-resolution features that are intended to be output by the GAN.


Method to provide info about files in log output.


Whether to invert the latitude axis during data extraction.


Get whether source data files are time independent


Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.


Flattened list of latitudes


Get upper bound on load workers based on memory limits.


Flattened list of longitudes


Get a list of low-resolution features.


List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.


Get the mean values for each feature.


Meta dataframe with coordinates.


Get number of time steps to extract


Check whether we need to get the full lat/lon grid to determine target and shape values


Get list of features needing extraction or derivation


Get upper bound on workers used for normalization.


Raster index property


Get list of features needed for computations


Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.


Time index for input data without time pruning.


Get number of time steps for all input files


Get requested shape for cached data


Full data shape


Check if there is a file for each time step, in which case we can send a subset of files to the data handler according to ti_pad_slice


Size of data array


Get data type for source files.


Get the standard deviation values for each feature.


Get lower left corner of raster


Get temporal range to extract from full dataset


Get max number of workers for computing time index


Get upper bound on time chunk size based on memory limits


Get time chunks which will be extracted from source data


Get the time frequency in hours as a float


Time index for input data with time pruning.


Get time index file path


Check if we should try to load cache


alias of MultiFileWindX


Calculate daily average data and store as attribute.


Randomly gets spatial sample and time sample


  • obs_ind_hourly (tuple) – Tuple of sampled spatial grid, time slice, and features indices. Used to get single observation like[observation_index]. This is for hourly high-res data slicing.

  • obs_ind_daily (tuple) – Same as obs_ind_hourly but the temporal index (i=2) is a slice of the daily data (self.daily_data) with day integers.


Get data for observation using random observation index. Loops repeatedly over randomized time index


  • obs_hourly (np.ndarray) – 4D array (spatial_1, spatial_2, temporal_hourly, features)

  • obs_daily_avg (np.ndarray) – 4D array but the temporal axis is temporal_hourly//24 (spatial_1, spatial_2, temporal_daily, features)

split_data(data=None, val_split=0.0, shuffle_time=False)[source]

Split time dimension into set of training indices and validation indices. For NSRDB it makes sure that the splits happen at midnight.

  • data (np.ndarray) – 4D array of high res data (spatial_1, spatial_2, temporal, features)

  • val_split (float) – Fraction of data to separate for validation.

  • shuffle_time (bool) – No effect. Used to fit base class function signature.


  • data (np.ndarray) – (spatial_1, spatial_2, temporal, features) Training data fraction of initial data array. Initial data array is overwritten by this new data array.

  • val_data (np.ndarray) – (spatial_1, spatial_2, temporal, features) Validation data fraction of initial data array.

property attrs

Get atttributes of input data


dict – Dictionary of attributes


Cache feature data to file and delete from memory


cache_file_paths (str | None) – Path to file for saving feature data

property cache_files

Cache files for storing extracted data

property cache_pattern

Get correct cache file pattern for formatting.


_cache_pattern (str) – The cache file pattern with formatting keys included.

property cached_features

List of features which have been requested but have been determined not to need extraction. Thus they have been cached already.


Cap all workers args by max_workers

static check_cached_features(features, cache_files=None, overwrite_cache=False, load_cached=False)

Check which features have been cached and check flags to determine whether to load or extract this features again

  • features (list) – list of features to extract

  • cache_files (list | None) – Path to files with saved feature data

  • overwrite_cache (bool) – Whether to overwrite cached files

  • load_cached (bool) – Whether to load data from cache files


list – List of features to extract. Might not include features which have cache files.


Check if data is cached and clear data if not load_cached


Free memory used for data arrays

property compute_workers

Get upper bound for compute workers based on memory limits. Used to compute derived features from source dataset.

data_fill(shifted_time_chunks, max_workers=None)

Fill final data array with extracted / computed chunks

  • shifted_time_chunks (list) – List of time slices corresponding to the appropriate location of extracted / computed chunks in the final data array

  • max_workers (int | None) – Max number of workers to use for building final data array. If None max available workers will be used. If 1 cached data will be loaded in serial

property derive_features

List of features which need to be derived from other features

classmethod extract_feature(file_paths, raster_index, feature, time_slice=slice(None, None, None), **kwargs)

Extract single feature from data source

  • file_paths (list) – path to data file

  • raster_index (ndarray) – Raster index array

  • feature (str) – Feature to extract from data

  • time_slice (slice) – slice of time to extract

  • kwargs (dict) – keyword arguments passed to source handler


ndarray – Data array for extracted feature (spatial_1, spatial_2, temporal)

property extract_features

Features to extract directly from the source handler

property extract_workers

Get upper bound for extract workers based on memory limits. Used to extract data from source dataset. The max number of extract workers is number of time chunks * number of features

property feature_mem

Number of bytes for a single feature array. Used to estimate max_workers.


int – Number of bytes for a single feature array

property file_paths

Get file paths for input data

property full_raw_lat_lon

Get the full lat/lon grid without doing any latitude inversion

get_cache_file_names(cache_pattern, grid_shape=None, time_index=None, target=None, features=None)

Get names of cache files from cache_pattern and feature names

  • cache_pattern (str) – Pattern to use for cache file names

  • grid_shape (tuple) – Shape of grid to use for cache file naming

  • time_index (list | pd.DatetimeIndex) – Time index to use for cache file naming

  • target (tuple) – Target to use for cache file naming

  • features (list) – List of features to use for cache file naming


list – List of cache file names

static get_capped_workers(max_workers_cap, max_workers)

Get max number of workers for a given job. Capped to global max workers if specified

  • max_workers_cap (int | None) – Cap for job specific max_workers

  • max_workers (int | None) – Job specific max_workers


max_workers (int | None) – job specific max_workers capped by max_workers_cap if provided

static get_closest_lat_lon(lat_lon, target)

Get closest indices to target lat lon

  • lat_lon (ndarray) – Array of lat/lon (spatial_1, spatial_2, 2) Last dimension in order of (lat, lon)

  • target (tuple) – (lat, lon) for target coordinate


  • row (int) – row index for closest lat/lon to target lat/lon

  • col (int) – col index for closest lat/lon to target lat/lon

classmethod get_full_domain(file_paths)

Get target and shape for largest domain possible

classmethod get_handle_features(file_paths)

Get all available features in input data


file_paths (list) – List of input file paths


handle_features (list) – List of available input features

classmethod get_input_arrays(data, chunk_number, f, handle_features)

Get only arrays needed for computations

  • data (dict) – Dictionary of feature arrays

  • chunk_number – time chunk for which to get input arrays

  • f (str) – feature to compute using input arrays

  • handle_features (list) – Features available in raw data


dict – Dictionary of arrays with only needed features

classmethod get_inputs_recursive(feature, handle_features)

Lookup inputs needed to compute feature. Walk through inputs methods for each required feature to get all raw features.

  • feature (str) – Feature for which to get needed inputs for derivation

  • handle_features (list) – Features available in raw data


list – List of input features

classmethod get_lat_lon(file_paths, raster_index, invert_lat=False)

Get lat/lon grid for requested target and shape

  • file_paths (list) – path to data file

  • raster_index (ndarray | list) – Raster index array or list of slices

  • invert_lat (bool) – Flag to invert data along the latitude axis. Wrf data tends to use an increasing ordering for latitude while wtk uses a decreasing ordering.


ndarray – (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension

get_lat_lon_df(target, features=None)

Get timeseries for given target

  • target (tuple) – (lat, lon) for target coordinate

  • features (list | None) – Optional list of features to include in returned data. If None then all available features are returned.


df (pd.DataFrame) – Pandas dataframe with columns for each feature and timeindex for the given target

classmethod get_node_cmd(config)

Get a CLI call to initialize DataHandler and cache data.


config (dict) – sup3r data handler config with all necessary args and kwargs to initialize DataHandler and run data extraction.


Get raster index for file data. Here we assume the list of paths in file_paths all have data with the same spatial domain. We use the first file in the list to compute the raster.


raster_index (np.ndarray) – 2D array of grid indices

classmethod get_raw_feature_list(features, handle_features)

Lookup inputs needed to compute feature

  • features (list) – Features for which to get needed inputs for derivation

  • handle_features (list) – Features available in raw data


list – List of input features

classmethod get_time_index(file_paths, max_workers=None, **kwargs)

Get time index from data files

  • file_paths (list) – path to data file

  • max_workers (int | None) – placeholder to match signature

  • kwargs (dict) – placeholder to match signature


time_index (pd.DateTimeIndex) – Time index from h5 source file(s)

property grid_mem

Get memory used by a feature at a single time step


int – Number of bytes for a single feature array at a single time step

property grid_shape

Get shape of raster


_grid_shape (tuple) – (rows, cols) grid size.

property handle_features

All features available in raw input

classmethod has_exact_feature(feature, handle)

Check if exact feature is in handle

  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object


bool – Whether handle contains exact feature or not

classmethod has_multilevel_feature(feature, handle)

Check if exact feature is in handle

  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object


bool – Whether handle contains multilevel data for given feature

classmethod has_surrounding_features(feature, handle)

Check if handle has feature values at surrounding heights. e.g. if feature=U_40m check if the handler has u at heights below and above 40m

  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object


bool – Whether feature has surrounding heights

property hr_exo_features

Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection. These must come at the end of the high-res feature set. These can also be input to the model as low-res features.

property hr_out_features

Get a list of high-resolution features that are intended to be output by the GAN. Does not include high-resolution exogenous features

property input_file_info

Method to provide info about files in log output. Since NETCDF files have single time slices printing out all the file paths is just a text dump without much info.


str – message to append to log output that does not include a huge info dump of file paths

property invert_lat

Whether to invert the latitude axis during data extraction. This is to enforce a descending latitude ordering so that the lower left corner of the grid is at idx=(-1, 0) instead of idx=(0, 0)

property is_time_independent

Get whether source data files are time independent

property lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This ensures that the lower left hand corner of the domain is given by lat_lon[-1, 0]



property latitude

Flattened list of latitudes

classmethod lats_are_descending(lat_lon)

Check if latitudes are in descending order (i.e. the target coordinate is already at the bottom left corner)


lat_lon (np.ndarray) – Lat/Lon array with shape (n_lats, n_lons, 2)



lin_bc(bc_files, threshold=0.1)

Bias correct the data in this DataHandler using linear bias correction factors from files output by MonthlyLinearCorrection or LinearCorrection from sup3r.bias.bias_calc

  • bc_files (list | tuple | str) – One or more filepaths to .h5 files output by MonthlyLinearCorrection or LinearCorrection. These should contain datasets named “{feature}_scalar” and “{feature}_adder” where {feature} is one of the features contained by this DataHandler and the data is a 3D array of shape (lat, lon, time) where time is length 1 for annual correction or 12 for monthly correction.

  • threshold (float) – Nearest neighbor euclidean distance threshold. If the DataHandler coordinates are more than this value away from the bias correction lat/lon, an error is raised.


Load data from cache files and split into training and validation


with_split (bool) – Whether to split into training and validation data or not.

property load_workers

Get upper bound on load workers based on memory limits. Used to load cached data.

property longitude

Flattened list of longitudes

classmethod lookup(feature, attr_name, handle_features=None)

Lookup feature in feature registry

  • feature (str) – Feature to lookup in registry

  • attr_name (str) – Type of method to lookup. e.g. inputs or compute

  • handle_features (list) – List of feature names (datasets) available in the source file. If feature is found explicitly in this list, height/pressure suffixes will not be appended to the output.


method | None – Feature registry method corresponding to feature

property lr_features

Get a list of low-resolution features. It is assumed that all features are used in the low-resolution observations. If you want to use high-res-only features, use the DualDataHandler class.

property lr_only_features

List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.


Drop timesteps with NaN data

property means

Get the mean values for each feature.



property meta

Meta dataframe with coordinates.

property n_tsteps

Get number of time steps to extract

property need_full_domain

Check whether we need to get the full lat/lon grid to determine target and shape values

property noncached_features

Get list of features needing extraction or derivation

property norm_workers

Get upper bound on workers used for normalization.

normalize(means=None, stds=None, features=None, max_workers=None)

Normalize all data features.

  • means (dict | none) – Dictionary of means for all features with keys: feature names and values: mean values. If this is None, the self.means attribute will be used. If this is not None, this DataHandler object means attribute will be updated.

  • stds (dict | none) – dictionary of standard deviation values for all features with keys: feature names and values: standard deviations. If this is None, the self.stds attribute will be used. If this is not None, this DataHandler object stds attribute will be updated.

  • features (list | None) – List of features used for indexing data array during normalization.

  • max_workers (None | int) – Max workers to perform normalization. if None, self.norm_workers will be used

classmethod parallel_compute(data, file_paths, raster_index, time_chunks, derived_features, all_features, handle_features, max_workers=None)

Compute features using parallel subprocesses

  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • derived_features (list) – list of feature strings which need to be derived

  • all_features (list) – list of all features including those requiring derivation from input features

  • handle_features (list) – Features available in raw data

  • max_workers (int | None) – Number of max workers to use for computation. If equal to 1 then method is run in serial


data (dict) – dictionary of feature arrays, including computed features, with integer keys for chunks and str keys for features. Includes e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

classmethod parallel_extract(file_paths, raster_index, time_chunks, input_features, max_workers=None, **kwargs)

Extract features using parallel subprocesses

  • file_paths (list) – list of file paths

  • raster_index (ndarray | list) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • input_features (list) – list of input feature strings

  • max_workers (int | None) – Number of max workers to use for extraction. If equal to 1 then method is run in serial

  • kwargs (dict) – kwargs passed to source handler for data extraction. e.g. This could be {‘parallel’: True,

    ‘chunks’: {‘south_north’: 120, ‘west_east’: 120}}

    which then gets passed to xr.open_mfdataset(file, **kwargs)


dict – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

parallel_load(data, cache_files, features, max_workers=None)

Load feature data in parallel

  • data (ndarray) – Array to fill with cached data

  • cache_files (list) – List of cache files for each feature

  • features (list) – List of requested features

  • max_workers (int | None) – Max number of workers to use for parallel data loading. If None the max number of available workers will be used.

classmethod pop_old_data(data, chunk_number, all_features)

Remove input feature data if no longer needed for requested features

  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • chunk_number (int) – time chunk index to check

  • all_features (list) – list of all requested features including those requiring derivation from input features


Run some preflight checks and verify that the inputs are valid

qdm_bc(bc_files, reference_feature, relative=True, threshold=0.1, no_trend=False)

Bias Correction using Quantile Delta Mapping

Bias correct this DataHandler’s data with Quantile Delta Mapping. The required statistical distributions should be pre-calculated using sup3r.bias.qdm.QuantileDeltaMappingCorrection.

Warning: There is no guarantee that the coefficients from bc_files match the resource processed here. Be careful choosing bc_files.

  • bc_files (list | tuple | str) – One or more filepaths to .h5 files output by bias_calc.QuantileDeltaMappingCorrection. These should contain datasets named “base_{reference_feature}_params”, “bias_{feature}_params”, and “bias_fut_{feature}_params” where {feature} is one of the features contained by this DataHandler and the data is a 3D array of shape (lat, lon, time) where time.

  • reference_feature (str) – Name of the feature used as (historical) reference. Dataset with name “base_{reference_feature}_params” will be retrieved from bc_files.

  • relative (bool, default=True) – Switcher to apply QDM as a relative (use True) or absolute (use False) correction value.

  • threshold (float, default=0.1) – Nearest neighbor euclidean distance threshold. If the DataHandler coordinates are more than this value away from the bias correction lat/lon, an error is raised.

  • no_trend (bool, default=False) – An option to ignore the trend component of the correction, thus resulting in an ordinary Quantile Mapping, i.e. corrects the bias by comparing the distributions of the biased dataset with a reference datasets. See params_mf of rex.utilities.bc_utils.QuantileDeltaMapping. Note that this assumes that “bias_{feature}_params” (params_mh) is the data distribution representative for the target data.

property raster_index

Raster index property

property raw_features

Get list of features needed for computations

property raw_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This returns the gid without any lat inversion.



property raw_time_index

Time index for input data without time pruning. This is the base time index for the raw input data.

property raw_tsteps

Get number of time steps for all input files

classmethod recursive_compute(data, feature, handle_features, file_paths, raster_index)

Compute intermediate features recursively

  • data (dict) – dictionary of feature arrays. e.g. data[feature] = array. (spatial_1, spatial_2, temporal)

  • feature (str) – Name of feature to compute

  • handle_features (list) – Features available in raw data

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain


ndarray – Array of computed feature data

property requested_shape

Get requested shape for cached data


Build base 4D data array. Can handle multiple files but assumes each file has the same spatial domain


data (np.ndarray) – 4D array of high res data (spatial_1, spatial_2, temporal, features)


Run the data computation / derivation from raw features to desired features.


Run the raw dataset extraction process from disk to raw un-manipulated datasets.


Run nn nan fill on full data array.

classmethod serial_compute(data, file_paths, raster_index, time_chunks, derived_features, all_features, handle_features)

Compute features in series

  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • derived_features (list) – list of feature strings which need to be derived

  • all_features (list) – list of all features including those requiring derivation from input features

  • handle_features (list) – Features available in raw data


data (dict) – dictionary of feature arrays, including computed features, with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)


Fill final data array in serial


shifted_time_chunks (list) – List of time slices corresponding to the appropriate location of extracted / computed chunks in the final data array

classmethod serial_extract(file_paths, raster_index, time_chunks, input_features, **kwargs)

Extract features in series

  • file_paths (list) – list of file paths

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • input_features (list) – list of input feature strings

  • kwargs (dict) – kwargs passed to source handler for data extraction. e.g. This could be {‘parallel’: True,

    ‘chunks’: {‘south_north’: 120, ‘west_east’: 120}}

    which then gets passed to xr.open_mfdataset(file, **kwargs)


dict – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

property shape

Full data shape


shape (tuple) – Full data shape (spatial_1, spatial_2, temporal, features)

property single_ts_files

Check if there is a file for each time step, in which case we can send a subset of files to the data handler according to ti_pad_slice

property size

Size of data array


size (int) – Number of total elements contained in data array

classmethod source_handler(file_paths, **kwargs)

Rex data handler

Note that xarray appears to treat open file handlers as singletons within a threadpool, so its okay to open this source_handler without a context handler or a .close() statement.

  • file_paths (str | list) – paths to data files

  • kwargs (dict) – keyword arguments passed to source handler


data (ResourceX)

property source_type

Get data type for source files. Either nc or h5

property stds

Get the standard deviation values for each feature.



property target

Get lower left corner of raster


_target (tuple) – (lat, lon) lower left corner of raster.

property temporal_slice

Get temporal range to extract from full dataset

property ti_workers

Get max number of workers for computing time index

property time_chunk_size

Get upper bound on time chunk size based on memory limits

property time_chunks

Get time chunks which will be extracted from source data


_time_chunks (list) – List of time chunks used to split up source data time dimension so that each chunk can be extracted individually

property time_freq_hours

Get the time frequency in hours as a float

property time_index

Time index for input data with time pruning. This is the raw time index with a cropped range and time step applied.


Check if the number of input files and the length of the time index is the same

property time_index_file

Get time index file path

property try_load

Check if we should try to load cache

classmethod valid_handle_features(features, handle_features)

Check if features are in handle

  • features (str | list) – Raw feature names e.g. U_100m

  • handle_features (list) – Features available in raw data


bool – Whether feature basename is in handle

classmethod valid_input_features(features, handle_features)

Check if features are in handle or have compute methods

  • features (str | list) – Raw feature names e.g. U_100m

  • handle_features (list) – Features available in raw data


bool – Whether feature basename is in handle