sup3r.preprocessing.data_handlers.base.DataHandlerH5SolarCC#
- class DataHandlerH5SolarCC(file_paths, *, features='all', load_features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, time_roll=0, time_shift=None, hr_spatial_coarsen=1, nan_method_kwargs=None, BaseLoader=None, FeatureRegistry=None, interp_kwargs=None, cache_kwargs=None, **kwargs)[source]#
Bases:
DailyDataHandler
Extended
DailyDataHandler
specifically for handling H5 data for SolarCC applications- Parameters:
file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass
features (list | str) – Features to derive. If ‘all’ then all available raw features will just be loaded. Specify explicit feature names for derivations.
load_features (list | str) – Features to load and make available for derivations. If ‘all’ then all available raw features will be loaded and made available for derivations. This can be used to restrict features used for derivations. For example, to derive ‘temperature_100m’ from only temperature isobars, from data that includes single level values as well (like temperature_2m), don’t include ‘temperature_2m’ in the
load_features
list.res_kwargs (dict) – Additional keyword arguments passed through to the
BaseLoader
. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.chunks (dict | str) – Dictionary of chunk sizes to pass through to
dask.array.from_array()
orxr.Dataset().chunk()
. Will be converted to a tuple when used infrom_array()
. These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” or None in addition to a dictionary. None will not do any chunking and load data into memory asnp.array
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list
[start, stop, step]
threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.
time_roll (int) – Number of steps to roll along the time axis. Passed to xr.Dataset.roll()
time_shift (int | None) – Number of minutes to shift time axis. This can be used, for example, to shift the time index for daily data so that the time stamp for a given day starts at the zeroth minute instead of at noon, as is the case for most GCM data.
hr_spatial_coarsen (int) – Spatial coarsening factor. Passed to xr.Dataset.coarsen()
nan_method_kwargs (str | dict | None) – Keyword arguments for nan handling. If ‘mask’, time steps with nans will be dropped. Otherwise this should be a dict of kwargs which will be passed to
sup3r.preprocessing.accessor.Sup3rX.interpolate_na()
. e.g. {‘method’: ‘linear’, ‘dim’: ‘time’}BaseLoader (Callable) – Base level file loader wrapped by
Loader
. This is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.FeatureRegistry (dict) – Dictionary of
DerivedFeature
objects used for derivationsinterp_kwargs (dict | None) – Dictionary of kwargs for level interpolation. Can include “method” and “run_level_check” keys. Method specifies how to perform height interpolation. e.g. Deriving u_20m from u_10m and u_100m. Options are “linear” and “log”. See
sup3r.preprocessing.derivers.Deriver.do_level_interpolation()
cache_kwargs (dict | None) – Dictionary with kwargs for caching wrangled data. This should at minimum include a cache_pattern key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type. See class:Cacher for description of more arguments.
kwargs (dict) – Dictionary of additional keyword args for
Rasterizer
, used specifically for rasterizing flattened data
Methods
check_registry
(feature)Get compute method from the registry if available.
derive
(feature)Routine to derive requested features.
do_level_interpolation
(feature[, interp_kwargs])Interpolate over height or pressure to derive the given feature.
get_inputs
(feature)Get inputs for the given feature and inputs for those inputs.
get_multi_level_data
(feature)Get data stored in multi-level arrays, like u stored on pressure levels.
get_single_level_data
(feature)When doing level interpolation we should include the single level data available.
has_interp_variables
(feature)Check if the given feature can be interpolated from values at nearby heights or from pressure level data.
map_new_name
(feature, pattern)If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
no_overlap
(feature)Check if any of the nested inputs for 'feature' contain 'feature'
post_init_log
([args_dict])Log additional arguments after initialization.
wrap
(data)Return a
Sup3rDataset
object or tuple of such.Attributes
- BASE_LOADER#
alias of
MultiFileNSRDBX
- check_registry(feature) ndarray | Array | str | None #
Get compute method from the registry if available. Will check for pattern feature match in feature registry. e.g. if u_100m matches a feature registry entry of u_(.*)m
- property data#
Return underlying data.
- Returns:
See also
- derive(feature) ndarray | Array #
Routine to derive requested features. Employs a little recursion to locate differently named features with a name map in the feature registry. i.e. if FEATURE_REGISTRY contains a key, value pair like “windspeed”: “wind_speed” then requesting “windspeed” will ultimately return a compute method (or fetch from raw data) for “wind_speed
Note
Features are all saved as lower case names and __contains__ checks will use feature.lower()
- do_level_interpolation(feature, interp_kwargs=None) DataArray #
Interpolate over height or pressure to derive the given feature.
- get_inputs(feature)#
Get inputs for the given feature and inputs for those inputs.
- get_multi_level_data(feature)#
Get data stored in multi-level arrays, like u stored on pressure levels.
- get_single_level_data(feature)#
When doing level interpolation we should include the single level data available. e.g. If we have u_100m already and want to interpolate u_40m from multi-level data U we should add u_100m at height 100m before doing interpolation, since 100 could be a closer level to 40m than those available in U.
- has_interp_variables(feature)#
Check if the given feature can be interpolated from values at nearby heights or from pressure level data. e.g. If
u_10m
andu_50m
exist thenu_30m
can be interpolated from these. If a pressure level arrayu
is available this can also be used, in conjunction with height data.
- map_new_name(feature, pattern)#
If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
- no_overlap(feature)#
Check if any of the nested inputs for ‘feature’ contain ‘feature’
- post_init_log(args_dict=None)#
Log additional arguments after initialization.
- property shape#
Get shape of underlying data.
- wrap(data)#
Return a
Sup3rDataset
object or tuple of such. This is a tuple when the .data attribute belongs to aCollection
object likeBatchHandler
. Otherwise this isSup3rDataset
object, which is either a wrapped 2-tuple or 1-tuple (e.g.len(data) == 2
orlen(data) == 1)
. This is a 2-tuple when.data
belongs to a dual container object likeDualSampler
and a 1-tuple otherwise.