reVX.hybrid_stats.temporal_agg.TemporalAgg

class TemporalAgg(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, local_time=False, **resample_kwargs)[source]

Bases: object

Class to temporally aggregate time-series data

Parameters:
  • src_fpath (str) – Path to source h5 file

  • dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.

  • freq (str, optional) – Aggregation frequency, by default ‘1d’

  • dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None

  • year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None

  • local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.

  • resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample

Methods

aggregate([method, max_workers, ...])

Aggregate desired datasets and write to disk

run(src_fpath, dst_fpath[, freq, dsets, ...])

Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method.

Attributes

dsets

Datasets to aggregate

property dsets

Datasets to aggregate

Returns:

list

aggregate(method='mean', max_workers=None, chunks_per_worker=5)[source]

Aggregate desired datasets and write to disk

Parameters:
  • method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

classmethod run(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, method='mean', max_workers=None, chunks_per_worker=5, local_time=False, **resample_kwargs)[source]

Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method. Save the aggregated datasets to the dst .h5 file.

Parameters:
  • src_fpath (str) – Path to source h5 file

  • dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.

  • freq (str, optional) – Aggregation frequency, by default ‘1d’

  • dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None

  • year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None

  • method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.

  • resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample