sup3r.preprocessing.data_handlers.nc_cc.DataHandlerNCforCCwithPowerLaw#
- class DataHandlerNCforCCwithPowerLaw(file_paths, *, features='all', nsrdb_source_fp=None, nsrdb_agg=1, nsrdb_smoothing=0, scale_clearsky_ghi=True, load_features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, time_roll=0, time_shift=None, hr_spatial_coarsen=1, nan_method_kwargs=None, BaseLoader=None, FeatureRegistry=None, interp_kwargs=None, cache_kwargs=None, **kwargs)[source]#
Bases:
DataHandlerNCforCC
Add power law wind methods to feature registry.
- Parameters:
file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass
features (list | str) – Features to derive. If ‘all’ then all available raw features will just be loaded. Specify explicit feature names for derivations.
nsrdb_source_fp (str | None) – Optional NSRDB source h5 file to retrieve clearsky_ghi from to calculate CC clearsky_ratio along with rsds (ghi) from the CC netcdf file.
nsrdb_agg (int) – Optional number of NSRDB source pixels to aggregate clearsky_ghi from to a single climate change netcdf pixel. This can be used if the CC.nc data is at a much coarser resolution than the source nsrdb data.
nsrdb_smoothing (float) – Optional gaussian filter smoothing factor to smooth out clearsky_ghi from high-resolution nsrdb source data. This is typically done because spatially aggregated nsrdb data is still usually rougher than CC irradiance data.
scale_clearsky_ghi (bool) – Flag to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel. This is useful when calculating “clearsky_ratio” so that there are not an unrealistic number of 1 values if the maximum NSRDB clearsky_ghi is much lower than the GCM values
kwargs (dict) – Dictionary of additional keyword args for
Rasterizer
, used specifically for rasterizing flattened dataload_features (list | str) – Features to load and make available for derivations. If ‘all’ then all available raw features will be loaded and made available for derivations. This can be used to restrict features used for derivations. For example, to derive ‘temperature_100m’ from only temperature isobars, from data that includes single level values as well (like temperature_2m), don’t include ‘temperature_2m’ in the
load_features
list.res_kwargs (dict) – Additional keyword arguments passed through to the
BaseLoader
. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.chunks (dict | str) – Dictionary of chunk sizes to pass through to
dask.array.from_array()
orxr.Dataset().chunk()
. Will be converted to a tuple when used infrom_array()
. These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” or None in addition to a dictionary. None will not do any chunking and load data into memory asnp.array
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list
[start, stop, step]
threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.
time_roll (int) – Number of steps to roll along the time axis. Passed to xr.Dataset.roll()
time_shift (int | None) – Number of minutes to shift time axis. This can be used, for example, to shift the time index for daily data so that the time stamp for a given day starts at the zeroth minute instead of at noon, as is the case for most GCM data.
hr_spatial_coarsen (int) – Spatial coarsening factor. Passed to xr.Dataset.coarsen()
nan_method_kwargs (str | dict | None) – Keyword arguments for nan handling. If ‘mask’, time steps with nans will be dropped. Otherwise this should be a dict of kwargs which will be passed to
sup3r.preprocessing.accessor.Sup3rX.interpolate_na()
. e.g. {‘method’: ‘linear’, ‘dim’: ‘time’}BaseLoader (Callable) – Base level file loader wrapped by
Loader
. This is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.FeatureRegistry (dict) – Dictionary of
DerivedFeature
objects used for derivationsinterp_kwargs (dict | None) – Dictionary of kwargs for level interpolation. Can include “method” and “run_level_check” keys. Method specifies how to perform height interpolation. e.g. Deriving u_20m from u_10m and u_100m. Options are “linear” and “log”. See
sup3r.preprocessing.derivers.Deriver.do_level_interpolation()
cache_kwargs (dict | None) – Dictionary with kwargs for caching wrangled data. This should at minimum include a cache_pattern key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type. See class:Cacher for description of more arguments.
Methods
check_registry
(feature)Get compute method from the registry if available.
derive
(feature)Routine to derive requested features.
do_level_interpolation
(feature[, interp_kwargs])Interpolate over height or pressure to derive the given feature.
Get clearsky ghi from an exogenous NSRDB source h5 file at the target CC meta data and time index.
get_inputs
(feature)Get inputs for the given feature and inputs for those inputs.
get_multi_level_data
(feature)Get data stored in multi-level arrays, like u stored on pressure levels.
get_single_level_data
(feature)When doing level interpolation we should include the single level data available.
get_time_slice
(ti_nsrdb)Get nsrdb data time slice consistent with self.time_index.
has_interp_variables
(feature)Check if the given feature can be interpolated from values at nearby heights or from pressure level data.
map_new_name
(feature, pattern)If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
no_overlap
(feature)Check if any of the nested inputs for 'feature' contain 'feature'
post_init_log
([args_dict])Log additional arguments after initialization.
Run checks on the files provided for extracting clearsky_ghi.
run_wrap_checks
(cs_ghi)Run check on rasterized data from clearsky_ghi source.
Method to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel.
wrap
(data)Return a
Sup3rDataset
object or tuple of such.Attributes
- check_registry(feature) ndarray | Array | str | None #
Get compute method from the registry if available. Will check for pattern feature match in feature registry. e.g. if u_100m matches a feature registry entry of u_(.*)m
- property data#
Return underlying data.
- Returns:
See also
- derive(feature) ndarray | Array #
Routine to derive requested features. Employs a little recursion to locate differently named features with a name map in the feature registry. i.e. if FEATURE_REGISTRY contains a key, value pair like “windspeed”: “wind_speed” then requesting “windspeed” will ultimately return a compute method (or fetch from raw data) for “wind_speed
Note
Features are all saved as lower case names and __contains__ checks will use feature.lower()
- do_level_interpolation(feature, interp_kwargs=None) DataArray #
Interpolate over height or pressure to derive the given feature.
- get_clearsky_ghi()#
Get clearsky ghi from an exogenous NSRDB source h5 file at the target CC meta data and time index.
TODO: Replace some of this with call to Regridder? Perform daily means with self.loader.coarsen?
- Returns:
cs_ghi (Union[np.ndarray, da.core.Array]) – Clearsky ghi (W/m2) from the nsrdb_source_fp h5 source file. Data shape is (lat, lon, time) where time is daily average values.
- get_inputs(feature)#
Get inputs for the given feature and inputs for those inputs.
- get_multi_level_data(feature)#
Get data stored in multi-level arrays, like u stored on pressure levels.
- get_single_level_data(feature)#
When doing level interpolation we should include the single level data available. e.g. If we have u_100m already and want to interpolate u_40m from multi-level data U we should add u_100m at height 100m before doing interpolation, since 100 could be a closer level to 40m than those available in U.
- get_time_slice(ti_nsrdb)#
Get nsrdb data time slice consistent with self.time_index.
- has_interp_variables(feature)#
Check if the given feature can be interpolated from values at nearby heights or from pressure level data. e.g. If
u_10m
andu_50m
exist thenu_30m
can be interpolated from these. If a pressure level arrayu
is available this can also be used, in conjunction with height data.
- map_new_name(feature, pattern)#
If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
- no_overlap(feature)#
Check if any of the nested inputs for ‘feature’ contain ‘feature’
- post_init_log(args_dict=None)#
Log additional arguments after initialization.
- run_input_checks()#
Run checks on the files provided for extracting clearsky_ghi. Make sure the loaded data is daily data and the step size is one day.
- run_wrap_checks(cs_ghi)#
Run check on rasterized data from clearsky_ghi source.
- scale_clearsky_ghi()#
Method to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel. This is useful when calculating “clearsky_ratio” so that there are not an unrealistic number of 1 values if the maximum NSRDB clearsky_ghi is much lower than the GCM values
- property shape#
Get shape of underlying data.
- wrap(data)#
Return a
Sup3rDataset
object or tuple of such. This is a tuple when the .data attribute belongs to aCollection
object likeBatchHandler
. Otherwise this isSup3rDataset
object, which is either a wrapped 2-tuple or 1-tuple (e.g.len(data) == 2
orlen(data) == 1)
. This is a 2-tuple when.data
belongs to a dual container object likeDualSampler
and a 1-tuple otherwise.