rex.temporal_stats.temporal_stats.TemporalStats

class TemporalStats(res_h5, statistics='mean', res_cls=<class 'rex.resource.Resource'>, hsds=False)[source]

Bases: object

Temporal Statistics from Resource Data

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

Methods

all(res_h5, dataset[, sites, statistics, ...])

Compute annual, monthly, monthly-diurnal, and diurnal stats

all_stats(dataset[, sites, max_workers, ...])

Compute annual, monthly, monthly-diurnal, and diurnal stats

compute_statistics(dataset[, sites, ...])

Compute statistics

diurnal(res_h5, dataset[, sites, ...])

Compute diurnal stats

diurnal_stats(dataset[, sites, max_workers, ...])

Compute diurnal stats

full_stats(dataset[, sites, max_workers, ...])

Compute stats for entire temporal extent of file

monthly(res_h5, dataset[, sites, ...])

Compute monthly stats

monthly_diurnal(res_h5, dataset[, sites, ...])

Compute monthly-diurnal stats

monthly_diurnal_stats(dataset[, sites, ...])

Compute monthly-diurnal stats

monthly_stats(dataset[, sites, max_workers, ...])

Compute monthly stats

run(res_h5, dataset[, sites, statistics, ...])

Compute temporal stats, by default full temporal extent stats

save_stats(res_stats, out_path)

Save statistics to disk

Attributes

STATS

lat_lon

Resource (lat, lon) coordinates

meta

Resource meta-data table

res_cls

Resource class to use to access res_h5

res_h5

Path to resource h5 file(s)

statistics

Dictionary of statistic functions/kwargs to run

time_index

Resource Datetimes

property res_h5

Path to resource h5 file(s)

Returns:

str

property statistics

Dictionary of statistic functions/kwargs to run

Returns:

dict

property res_cls

Resource class to use to access res_h5

Returns:

Class

property time_index

Resource Datetimes

Returns:

pandas.DatetimeIndex

property meta

Resource meta-data table

Returns:

pandas.DataFrame

property lat_lon

Resource (lat, lon) coordinates

Returns:

pandas.DataFrame

compute_statistics(dataset, sites=None, diurnal=False, month=False, combinations=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute statistics

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • diurnal (bool, optional) – Extract diurnal stats, by default False

  • month (bool, optional) – Extract monthly stats, by default False

  • combinations (bool, optional) – Extract all combinations of temporal stats, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

res_stats (pandas.DataFrame) – DataFrame of desired statistics at desired time intervals

full_stats(dataset, sites=None, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute stats for entire temporal extent of file

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

full_stats (pandas.DataFrame) – DataFrame of statistics for the entire temporal extent of file

monthly_stats(dataset, sites=None, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute monthly stats

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

monthly_stats (pandas.DataFrame) – DataFrame of monthly statistics

diurnal_stats(dataset, sites=None, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute diurnal stats

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

diurnal_stats (pandas.DataFrame) – DataFrame of diurnal statistics

monthly_diurnal_stats(dataset, sites=None, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute monthly-diurnal stats

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

monthly_diurnal_stats (pandas.DataFrame) – DataFrame of monthly-diurnal statistics

all_stats(dataset, sites=None, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False)[source]

Compute annual, monthly, monthly-diurnal, and diurnal stats

Parameters:
  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

Returns:

all_diurnal_stats (pandas.DataFrame) – DataFrame of temporal statistics

save_stats(res_stats, out_path)[source]

Save statistics to disk

Parameters:
  • res_stats (pandas.DataFrame) – Table of statistics to save

  • out_path (str) – Directory, .csv, or .json path to save statistics too

classmethod run(res_h5, dataset, sites=None, statistics='mean', diurnal=False, month=False, combinations=False, res_cls=<class 'rex.resource.Resource'>, hsds=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False, out_path=None)[source]

Compute temporal stats, by default full temporal extent stats

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • diurnal (bool, optional) – Extract diurnal stats, by default False

  • month (bool, optional) – Extract monthly stats, by default False

  • combinations (bool, optional) – Extract all combinations of temporal stats, by default False

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

  • out_path (str, optional) – Directory, .csv, or .json path to save statistics too, by default None

Returns:

out_stats (pandas.DataFrame) – DataFrame of resource statistics

classmethod monthly(res_h5, dataset, sites=None, statistics='mean', res_cls=<class 'rex.resource.Resource'>, hsds=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False, out_path=None)[source]

Compute monthly stats

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

  • out_path (str, optional) – Directory, .csv, or .json path to save statistics too, by default None

Returns:

monthly_stats (pandas.DataFrame) – DataFrame of monthly statistics

classmethod diurnal(res_h5, dataset, sites=None, statistics='mean', res_cls=<class 'rex.resource.Resource'>, hsds=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False, out_path=None)[source]

Compute diurnal stats

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

  • out_path (str, optional) – Directory, .csv, or .json path to save statistics too, by default None

Returns:

diurnal_stats (pandas.DataFrame) – DataFrame of diurnal statistics

classmethod monthly_diurnal(res_h5, dataset, sites=None, statistics='mean', res_cls=<class 'rex.resource.Resource'>, hsds=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False, out_path=None)[source]

Compute monthly-diurnal stats

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

  • out_path (str, optional) – Directory, .csv, or .json path to save statistics too, by default None

Returns:

monthly_diurnal_stats (pandas.DataFrame) – DataFrame of monthly-diurnal statistics

classmethod all(res_h5, dataset, sites=None, statistics='mean', res_cls=<class 'rex.resource.Resource'>, hsds=False, max_workers=None, chunks_per_worker=5, lat_lon_only=True, mask_zeros=False, out_path=None)[source]

Compute annual, monthly, monthly-diurnal, and diurnal stats

Parameters:
  • res_h5 (str) – Path to resource h5 file(s)

  • dataset (str) – Dataset to extract stats for

  • sites (list | slice, optional) – Subset of sites to extract, by default None or all sites

  • statistics (str | tuple | dict, optional) – Statistics to extract, either a key or tuple of keys in cls.STATS, or a dictionary of the form {‘stat_name’: {‘func’: , ‘kwargs: {*}}}, by default ‘mean’

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • res_cls (Class, optional) – Resource class to use to access res_h5, by default Resource

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None

  • chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

  • lat_lon_only (bool, optional) – Only append lat, lon coordinates to stats, by default True

  • mask_zeros (bool) – Flag to only calculate stats when all data is > 0 (useful for global horizontal irradiance).

  • out_path (str, optional) – Directory, .csv, or .json path to save statistics too, by default None

Returns:

all_stats (pandas.DataFrame) – DataFrame of temporal statistics