sup3r.preprocessing.feature_handling.FeatureHandler

class FeatureHandler[source]

Bases: object

Feature Handler with cache for previously loaded features used in other calculations

Methods

extract_feature(file_paths, raster_index, ...)

Extract single feature from data source

get_input_arrays(data, chunk_number, f, ...)

Get only arrays needed for computations

get_inputs_recursive(feature, handle_features)

Lookup inputs needed to compute feature.

get_raw_feature_list(features, handle_features)

Lookup inputs needed to compute feature

has_exact_feature(feature, handle)

Check if exact feature is in handle

has_multilevel_feature(feature, handle)

Check if exact feature is in handle

has_surrounding_features(feature, handle)

Check if handle has feature values at surrounding heights.

lookup(feature, attr_name[, handle_features])

Lookup feature in feature registry

parallel_compute(data, file_paths, ...[, ...])

Compute features using parallel subprocesses

parallel_extract(file_paths, raster_index, ...)

Extract features using parallel subprocesses

pop_old_data(data, chunk_number, all_features)

Remove input feature data if no longer needed for requested features

recursive_compute(data, feature, ...)

Compute intermediate features recursively

serial_compute(data, file_paths, ...)

Compute features in series

serial_extract(file_paths, raster_index, ...)

Extract features in series

valid_handle_features(features, handle_features)

Check if features are in handle

valid_input_features(features, handle_features)

Check if features are in handle or have compute methods

Attributes

FEATURE_REGISTRY

classmethod valid_handle_features(features, handle_features)[source]

Check if features are in handle

Parameters:
  • features (str | list) – Raw feature names e.g. U_100m

  • handle_features (list) – Features available in raw data

Returns:

bool – Whether feature basename is in handle

classmethod valid_input_features(features, handle_features)[source]

Check if features are in handle or have compute methods

Parameters:
  • features (str | list) – Raw feature names e.g. U_100m

  • handle_features (list) – Features available in raw data

Returns:

bool – Whether feature basename is in handle

classmethod pop_old_data(data, chunk_number, all_features)[source]

Remove input feature data if no longer needed for requested features

Parameters:
  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • chunk_number (int) – time chunk index to check

  • all_features (list) – list of all requested features including those requiring derivation from input features

classmethod has_surrounding_features(feature, handle)[source]

Check if handle has feature values at surrounding heights. e.g. if feature=U_40m check if the handler has u at heights below and above 40m

Parameters:
  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object

Returns:

bool – Whether feature has surrounding heights

classmethod has_exact_feature(feature, handle)[source]

Check if exact feature is in handle

Parameters:
  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object

Returns:

bool – Whether handle contains exact feature or not

classmethod has_multilevel_feature(feature, handle)[source]

Check if exact feature is in handle

Parameters:
  • feature (str) – Raw feature name e.g. U_100m

  • handle (xarray.Dataset) – netcdf data object

Returns:

bool – Whether handle contains multilevel data for given feature

classmethod serial_extract(file_paths, raster_index, time_chunks, input_features, **kwargs)[source]

Extract features in series

Parameters:
  • file_paths (list) – list of file paths

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • input_features (list) – list of input feature strings

  • kwargs (dict) – kwargs passed to source handler for data extraction. e.g. This could be {‘parallel’: True,

    ‘chunks’: {‘south_north’: 120, ‘west_east’: 120}}

    which then gets passed to xr.open_mfdataset(file, **kwargs)

Returns:

dict – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

classmethod parallel_extract(file_paths, raster_index, time_chunks, input_features, max_workers=None, **kwargs)[source]

Extract features using parallel subprocesses

Parameters:
  • file_paths (list) – list of file paths

  • raster_index (ndarray | list) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • input_features (list) – list of input feature strings

  • max_workers (int | None) – Number of max workers to use for extraction. If equal to 1 then method is run in serial

  • kwargs (dict) – kwargs passed to source handler for data extraction. e.g. This could be {‘parallel’: True,

    ‘chunks’: {‘south_north’: 120, ‘west_east’: 120}}

    which then gets passed to xr.open_mfdataset(file, **kwargs)

Returns:

dict – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

classmethod recursive_compute(data, feature, handle_features, file_paths, raster_index)[source]

Compute intermediate features recursively

Parameters:
  • data (dict) – dictionary of feature arrays. e.g. data[feature] = array. (spatial_1, spatial_2, temporal)

  • feature (str) – Name of feature to compute

  • handle_features (list) – Features available in raw data

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain

Returns:

ndarray – Array of computed feature data

classmethod serial_compute(data, file_paths, raster_index, time_chunks, derived_features, all_features, handle_features)[source]

Compute features in series

Parameters:
  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • derived_features (list) – list of feature strings which need to be derived

  • all_features (list) – list of all features including those requiring derivation from input features

  • handle_features (list) – Features available in raw data

Returns:

data (dict) – dictionary of feature arrays, including computed features, with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

classmethod parallel_compute(data, file_paths, raster_index, time_chunks, derived_features, all_features, handle_features, max_workers=None)[source]

Compute features using parallel subprocesses

Parameters:
  • data (dict) – dictionary of feature arrays with integer keys for chunks and str keys for features. e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

  • file_paths (list) – Paths to data files. Used if compute method operates directly on source handler instead of input arrays. This is done with features without inputs methods like lat_lon and topography.

  • raster_index (ndarray) – raster index for spatial domain

  • time_chunks (list) – List of slices to chunk data feature extraction along time dimension

  • derived_features (list) – list of feature strings which need to be derived

  • all_features (list) – list of all features including those requiring derivation from input features

  • handle_features (list) – Features available in raw data

  • max_workers (int | None) – Number of max workers to use for computation. If equal to 1 then method is run in serial

Returns:

data (dict) – dictionary of feature arrays, including computed features, with integer keys for chunks and str keys for features. Includes e.g. data[chunk_number][feature] = array. (spatial_1, spatial_2, temporal)

classmethod get_input_arrays(data, chunk_number, f, handle_features)[source]

Get only arrays needed for computations

Parameters:
  • data (dict) – Dictionary of feature arrays

  • chunk_number – time chunk for which to get input arrays

  • f (str) – feature to compute using input arrays

  • handle_features (list) – Features available in raw data

Returns:

dict – Dictionary of arrays with only needed features

classmethod lookup(feature, attr_name, handle_features=None)[source]

Lookup feature in feature registry

Parameters:
  • feature (str) – Feature to lookup in registry

  • attr_name (str) – Type of method to lookup. e.g. inputs or compute

  • handle_features (list) – List of feature names (datasets) available in the source file. If feature is found explicitly in this list, height/pressure suffixes will not be appended to the output.

Returns:

method | None – Feature registry method corresponding to feature

classmethod get_inputs_recursive(feature, handle_features)[source]

Lookup inputs needed to compute feature. Walk through inputs methods for each required feature to get all raw features.

Parameters:
  • feature (str) – Feature for which to get needed inputs for derivation

  • handle_features (list) – Features available in raw data

Returns:

list – List of input features

classmethod get_raw_feature_list(features, handle_features)[source]

Lookup inputs needed to compute feature

Parameters:
  • features (list) – Features for which to get needed inputs for derivation

  • handle_features (list) – Features available in raw data

Returns:

list – List of input features

abstract classmethod extract_feature(file_paths, raster_index, feature, time_slice=slice(None, None, None), **kwargs)[source]

Extract single feature from data source

Parameters:
  • file_paths (list) – path to data file

  • raster_index (ndarray) – Raster index array

  • time_slice (slice) – slice of time to extract

  • feature (str) – Feature to extract from data

  • kwargs (dict) – Keyword arguments passed to source handler

Returns:

ndarray – Data array for extracted feature (spatial_1, spatial_2, temporal)