nsrdb.aggregation.aggregation.Manager

class Manager(data, data_dir, meta_dir, year=2018, i_chunk=0, n_chunks=1)[source]

Bases: object

Framework for aggregation to a final NSRDB spatiotemporal resolution.

Parameters:

data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.
data_dir (str) – Root directory containing sub dirs with all data sources.
meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.
year (int) – Year being analyzed.
i_chunk (int) – Meta data chunk index currently being processed (zero indexed).
n_chunks (int) – Number of chunks to process the meta data in.

Methods

`DEFAULT_METHOD`(var, data_fpath, nn, w, final_ti)	Run agg using a spatial average and temporal moving window average.
`add_temporal`()	Get the temporal window sizes for all data sources.
`get_dset_attrs`(h5dir[, ignore_dsets])	Get output file dataset attributes for a set of datasets.
`knn`(meta, tree_fpath, meta_fpath[, k])	Run KNN between the final meta data and the pickled ckdtree.
`parse_data`()	Parse the data input for several useful attributes.
`preflight`([reqs])	Run validity checks on input data.
`run_chunk`(data, data_dir, meta_dir, i_chunk, ...)
`run_nn`()	Run nearest neighbor for all data sources against the final meta.
`write_output`(arr, var)	Write aggregated output data to the final output file.

Attributes

`AGG_METHODS`
`meta`	Get the final meta data with sources.
`meta_chunk`	Get the meta data for just this chunk of sites based on n_chunks and i_chunk.
`time_index`	Get the final time index.

classmethod DEFAULT_METHOD(var, data_fpath, nn, w, final_ti)

Run agg using a spatial average and temporal moving window average.

Parameters:

var (str) – Variable (dataset) name being aggregated.
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

parse_data()[source]: Parse the data input for several useful attributes.

preflight(reqs=('data_sub_dir', 'tree_file', 'meta_file', 'spatial', 'freq'))[source]

Run validity checks on input data.

Parameters:: reqs (list | tuple) – Required fields for each source dataset.

property time_index

Get the final time index.

Returns:: ti (pd.DatetimeIndex) – Time index for the intended year at the final (aggregated) time resolution.

property meta

Get the final meta data with sources.

Returns:: meta (pd.DataFrame) – Meta data for the final (aggregated) datasets with data source col.

property meta_chunk

Get the meta data for just this chunk of sites based on n_chunks and i_chunk.

Returns:: meta_chunk (pd.DataFrame) – Meta data reduced to a chunk of sites based on n_chunks and i_chunk

static get_dset_attrs(h5dir, ignore_dsets=('coordinates', 'time_index', 'meta'))[source]

Get output file dataset attributes for a set of datasets.

Parameters:

h5dir (str) – Path to directory containing multiple h5 files with all available dsets. Can also be a single h5 filepath.
ignore_dsets (tuple | list) – List of datasets to ignore (will not be aggregated).

Returns:

dsets (list) – List of datasets.
attrs (dict) – Dictionary of dataset attributes keyed by dset name.
chunks (dict) – Dictionary of chunk tuples keyed by dset name.
dtypes (dict) – dictionary of numpy datatypes keyed by dset name.
ti (pd.Datetimeindex) – Time index of source files in h5dir.

add_temporal()[source]: Get the temporal window sizes for all data sources.

run_nn()[source]: Run nearest neighbor for all data sources against the final meta.

static knn(meta, tree_fpath, meta_fpath, k=1)[source]

Run KNN between the final meta data and the pickled ckdtree.

Parameters:

meta (pd.DataFrame) – Final meta data.
tree_fpath (str) – Filepath to a pickled ckdtree containing ckdtree for source meta data.
meta_fpath (str) – Filepath to csv containing source meta data.
k (int) – Number of neighbors to query.

Returns:

d (np.ndarray) – Distance results. Shape is (len(meta), k)
i (np.ndarray) – Index results. Shape is (len(meta), k)

write_output(arr, var)[source]

Write aggregated output data to the final output file.

Parameters:

arr (np.ndarray) – Aggregated data with shape (t, n) where t is the final time index length and n is the number of sites in the current meta chunk.
var (str) – Variable (dataset) name to write to.

classmethod run_chunk(data, data_dir, meta_dir, i_chunk, n_chunks, year=2018, ignore_dsets=None, max_workers=None, log_file='run_agg_chunk.log', log_level='DEBUG')[source]

Parameters:

data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.
data_dir (str) – Root directory containing sub dirs with all data sources.
meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.
i_chunk (int) – Single chunk index to process.
n_chunks (int) – Number of chunks to process the meta data in.
year (int) – Year being analyzed.
ignore_dsets (list | None) – Source datasets to ignore (not aggregate). Optional.
max_workers (int | None) – Number of workers to user. Runs serially if max_workers == 1
log_file (str) – File to use for logging
log_level (str | bool) – Flag to initialize a log file at a given log level. False will not init a logger.