nsrdb.aggregation.aggregation.Manager

class Manager(data, data_dir, meta_dir, year=2018, i_chunk=0, n_chunks=1)[source]

Bases: object

Framework for aggregation to a final NSRDB spatiotemporal resolution.

Parameters:
  • data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.

  • data_dir (str) – Root directory containing sub dirs with all data sources.

  • meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.

  • year (int) – Year being analyzed.

  • i_chunk (int) – Meta data chunk index currently being processed (zero indexed).

  • n_chunks (int) – Number of chunks to process the meta data in.

Methods

DEFAULT_METHOD(var, data_fpath, nn, w, final_ti)

Run agg using a spatial average and temporal moving window average.

add_temporal()

Get the temporal window sizes for all data sources.

get_dset_attrs(h5dir[, ignore_dsets])

Get output file dataset attributes for a set of datasets.

knn(meta, tree_fpath, meta_fpath[, k])

Run KNN between the final meta data and the pickled ckdtree.

parse_data()

Parse the data input for several useful attributes.

preflight([reqs])

Run validity checks on input data.

run_chunk(data, data_dir, meta_dir, i_chunk, ...)

run_nn()

Run nearest neighbor for all data sources against the final meta.

write_output(arr, var)

Write aggregated output data to the final output file.

Attributes

AGG_METHODS

meta

Get the final meta data with sources.

meta_chunk

Get the meta data for just this chunk of sites based on n_chunks and i_chunk.

time_index

Get the final time index.

classmethod DEFAULT_METHOD(var, data_fpath, nn, w, final_ti)

Run agg using a spatial average and temporal moving window average.

Parameters:
  • var (str) – Variable (dataset) name being aggregated.

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

parse_data()[source]

Parse the data input for several useful attributes.

preflight(reqs=('data_sub_dir', 'tree_file', 'meta_file', 'spatial', 'freq'))[source]

Run validity checks on input data.

Parameters:

reqs (list | tuple) – Required fields for each source dataset.

property time_index

Get the final time index.

Returns:

ti (pd.DatetimeIndex) – Time index for the intended year at the final (aggregated) time resolution.

property meta

Get the final meta data with sources.

Returns:

meta (pd.DataFrame) – Meta data for the final (aggregated) datasets with data source col.

property meta_chunk

Get the meta data for just this chunk of sites based on n_chunks and i_chunk.

Returns:

meta_chunk (pd.DataFrame) – Meta data reduced to a chunk of sites based on n_chunks and i_chunk

static get_dset_attrs(h5dir, ignore_dsets=('coordinates', 'time_index', 'meta'))[source]

Get output file dataset attributes for a set of datasets.

Parameters:
  • h5dir (str) – Path to directory containing multiple h5 files with all available dsets. Can also be a single h5 filepath.

  • ignore_dsets (tuple | list) – List of datasets to ignore (will not be aggregated).

Returns:

  • dsets (list) – List of datasets.

  • attrs (dict) – Dictionary of dataset attributes keyed by dset name.

  • chunks (dict) – Dictionary of chunk tuples keyed by dset name.

  • dtypes (dict) – dictionary of numpy datatypes keyed by dset name.

  • ti (pd.Datetimeindex) – Time index of source files in h5dir.

add_temporal()[source]

Get the temporal window sizes for all data sources.

run_nn()[source]

Run nearest neighbor for all data sources against the final meta.

static knn(meta, tree_fpath, meta_fpath, k=1)[source]

Run KNN between the final meta data and the pickled ckdtree.

Parameters:
  • meta (pd.DataFrame) – Final meta data.

  • tree_fpath (str) – Filepath to a pickled ckdtree containing ckdtree for source meta data.

  • meta_fpath (str) – Filepath to csv containing source meta data.

  • k (int) – Number of neighbors to query.

Returns:

  • d (np.ndarray) – Distance results. Shape is (len(meta), k)

  • i (np.ndarray) – Index results. Shape is (len(meta), k)

write_output(arr, var)[source]

Write aggregated output data to the final output file.

Parameters:
  • arr (np.ndarray) – Aggregated data with shape (t, n) where t is the final time index length and n is the number of sites in the current meta chunk.

  • var (str) – Variable (dataset) name to write to.

classmethod run_chunk(data, data_dir, meta_dir, i_chunk, n_chunks, year=2018, ignore_dsets=None, max_workers=None, log_file='run_agg_chunk.log', log_level='DEBUG')[source]
Parameters:
  • data (dict) – Nested dictionary containing data on all NSRDB data sources (east, west, conus) and the final aggregated output.

  • data_dir (str) – Root directory containing sub dirs with all data sources.

  • meta_dir (str) – Directory containing meta and ckdtree files for each data source and the final aggregated output.

  • i_chunk (int) – Single chunk index to process.

  • n_chunks (int) – Number of chunks to process the meta data in.

  • year (int) – Year being analyzed.

  • ignore_dsets (list | None) – Source datasets to ignore (not aggregate). Optional.

  • max_workers (int | None) – Number of workers to user. Runs serially if max_workers == 1

  • log_file (str) – File to use for logging

  • log_level (str | bool) – Flag to initialize a log file at a given log level. False will not init a logger.