nsrdb.aggregation.aggregation.Manager
- class Manager(data, data_dir, meta_dir, year=2018, i_chunk=0, n_chunks=1)[source]
Bases:
object
Framework for aggregation to a final NSRDB spatiotemporal resolution.
- Parameters:
data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.
data_dir (str) – Root directory containing sub dirs with all data sources.
meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.
year (int) – Year being analyzed.
i_chunk (int) – Meta data chunk index currently being processed (zero indexed).
n_chunks (int) – Number of chunks to process the meta data in.
Methods
DEFAULT_METHOD
(var, data_fpath, nn, w, final_ti)Run agg using a spatial average and temporal moving window average.
Get the temporal window sizes for all data sources.
get_dset_attrs
(h5dir[, ignore_dsets])Get output file dataset attributes for a set of datasets.
knn
(meta, tree_fpath, meta_fpath[, k])Run KNN between the final meta data and the pickled ckdtree.
Parse the data input for several useful attributes.
preflight
([reqs])Run validity checks on input data.
run_chunk
(data, data_dir, meta_dir, i_chunk, ...)run_nn
()Run nearest neighbor for all data sources against the final meta.
write_output
(arr, var)Write aggregated output data to the final output file.
Attributes
AGG_METHODS
Get the final meta data with sources.
Get the meta data for just this chunk of sites based on n_chunks and i_chunk.
Get the final time index.
- classmethod DEFAULT_METHOD(var, data_fpath, nn, w, final_ti)
Run agg using a spatial average and temporal moving window average.
- Parameters:
var (str) – Variable (dataset) name being aggregated.
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).
- Returns:
data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.
- preflight(reqs=('data_sub_dir', 'tree_file', 'meta_file', 'spatial', 'freq'))[source]
Run validity checks on input data.
- Parameters:
reqs (list | tuple) – Required fields for each source dataset.
- property time_index
Get the final time index.
- Returns:
ti (pd.DatetimeIndex) – Time index for the intended year at the final (aggregated) time resolution.
- property meta
Get the final meta data with sources.
- Returns:
meta (pd.DataFrame) – Meta data for the final (aggregated) datasets with data source col.
- property meta_chunk
Get the meta data for just this chunk of sites based on n_chunks and i_chunk.
- Returns:
meta_chunk (pd.DataFrame) – Meta data reduced to a chunk of sites based on n_chunks and i_chunk
- static get_dset_attrs(h5dir, ignore_dsets=('coordinates', 'time_index', 'meta'))[source]
Get output file dataset attributes for a set of datasets.
- Parameters:
h5dir (str) – Path to directory containing multiple h5 files with all available dsets. Can also be a single h5 filepath.
ignore_dsets (tuple | list) – List of datasets to ignore (will not be aggregated).
- Returns:
dsets (list) – List of datasets.
attrs (dict) – Dictionary of dataset attributes keyed by dset name.
chunks (dict) – Dictionary of chunk tuples keyed by dset name.
dtypes (dict) – dictionary of numpy datatypes keyed by dset name.
ti (pd.Datetimeindex) – Time index of source files in h5dir.
- static knn(meta, tree_fpath, meta_fpath, k=1)[source]
Run KNN between the final meta data and the pickled ckdtree.
- Parameters:
meta (pd.DataFrame) – Final meta data.
tree_fpath (str) – Filepath to a pickled ckdtree containing ckdtree for source meta data.
meta_fpath (str) – Filepath to csv containing source meta data.
k (int) – Number of neighbors to query.
- Returns:
d (np.ndarray) – Distance results. Shape is (len(meta), k)
i (np.ndarray) – Index results. Shape is (len(meta), k)
- write_output(arr, var)[source]
Write aggregated output data to the final output file.
- Parameters:
arr (np.ndarray) – Aggregated data with shape (t, n) where t is the final time index length and n is the number of sites in the current meta chunk.
var (str) – Variable (dataset) name to write to.
- classmethod run_chunk(data, data_dir, meta_dir, i_chunk, n_chunks, year=2018, ignore_dsets=None, max_workers=None, log_file='run_agg_chunk.log', log_level='DEBUG')[source]
- Parameters:
data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.
data_dir (str) – Root directory containing sub dirs with all data sources.
meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.
i_chunk (int) – Single chunk index to process.
n_chunks (int) – Number of chunks to process the meta data in.
year (int) – Year being analyzed.
ignore_dsets (list | None) – Source datasets to ignore (not aggregate). Optional.
max_workers (int | None) – Number of workers to user. Runs serially if max_workers == 1
log_file (str) – File to use for logging
log_level (str | bool) – Flag to initialize a log file at a given log level. False will not init a logger.