reVX.hybrid_stats.temporal_agg.DatasetAgg

class DatasetAgg(h5_fpath, dset, time_index=None, year=None, local_time=False)[source]

Bases: object

Temporaly Aggregate Dataset

Parameters:
  • h5_fpath (str) – Path to source h5 filepath

  • dset (str) – Dataset to aggregate

  • time_index (pandas.DatetimeIndex, optional) – Dataset datetime index, if None, extract from h5_fpath, by default None

  • year (str | int, optional) – Year to extract time-index for if running on a multi-year file, by default None

  • local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.

Methods

aggregate([freq, method, max_workers, ...])

Aggregate dataset to desired frequency using desired method

run(h5_fpath, dset[, time_index, year, ...])

Temporally aggregate dataset to given frequency using given method

aggregate(freq='1d', method='mean', max_workers=None, chunks_per_worker=5, **resample_kwargs)[source]

Aggregate dataset to desired frequency using desired method

Parameters:
  • freq (str, optional) – Aggregation frequency, by default ‘1d’

  • method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample

Returns:

dset_agg (ndarray) – Aggregated dataset array

classmethod run(h5_fpath, dset, time_index=None, year=None, freq='1d', method='mean', max_workers=None, chunks_per_worker=5, local_time=False, **resample_kwargs)[source]

Temporally aggregate dataset to given frequency using given method

Parameters:
  • h5_fpath (str) – Path to source h5 filepath

  • dset (str) – Dataset to aggregate

  • time_index (pandas.DatetimeIndex, optional) – Dataset datetime index, if None, extract from h5_fpath, by default None

  • year (str | int, optional) – Year to extract time-index for if running on a multi-year file, by default None

  • freq (str, optional) – Aggregation frequency, by default ‘1d’

  • method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.

  • resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample

Returns:

agg_data (ndarray) – Dataset aggregated do given frequency using given method