nsrdb.utilities.sky_class.SkyClass

class SkyClass(fp_surf, fp_nsrdb, nsrdb_gid, clearsky_ratio=0.9, clear_time_frac=0.8, cloudy_time_frac=0.2, window_minutes=61, min_irradiance=0, sza_lim=89)[source]

Bases: object

Utility class for retrieving SURFRAD validation data alongside NSRDB data, determining the sky class by comparison to predicted clearsky irradiance, and providing data in a ready-to-validate dataframe.

Parameters:

fp_surf (str) – Filepath to surfrad h5 file.
fp_nsrdb (str) – Filepath to NSRDB file. can be a MultiFileResource path with: /dir/prefix*suffix.h5
nsrdb_gid (int) – GID (meta data index) for the site of interest in the fp_nsrdb file that matches the fp_surf file.
clearsky_ratio (float) – Clearsky ratio (ground measurement / clearsky irradiance) above which a timestep is considered clear
clear_time_frac (float) – Fraction of clear timesteps in an averaging window above which the whole window is considered clear. Between clear_time_frac and cloudy_time_frac is considered broken clouds.
cloudy_time_frac (float) – Fraction of cloudy timesteps in an averaging window below which the whole window is considered cloudy. Between clear_time_frac and cloudy_time_frac is considered broken clouds.
window_minutes (int) – Minutes that the moving average of the sky classification will be over. This will be calculated while considering the source time resolution of the SURFRAD measurements.
min_irradiance (float | int) – Minimum irradiance value, timesteps with either ground measured or NSRDB irradiance less than this value will be classified as missing.
sza_lim (int | float) – Maximum solar zenith angle, timesteps with sza > sza_lim will be classified as missing

Methods

`add_validation_data`(df)	Add NSRDB and SURFRAD ghi and dni data to a DataFrame.
`calculate_sky_class`(df)	Calculate the sky class (clear, cloudy, broken, missing) from the comparison df.
`get_comparison_df`()	Get a timeseries dataframe comparing the ground-measured GHI vs.
`get_rest_inputs`()	Get a dataframe of NSRDB variables required to run REST2.
`run`(fp_surf, fp_nsrdb, nsrdb_gid[, ...])
`run_rest`(rest_inputs)	Run REST2 using a dataframe of input data and return clearsky GHI.

Attributes

`ALIASES`
`REST_VARS`
`nsrdb`	Get the initialized Resource or MultiFileResource handler
`nsrdb_time_index`	Get the datetimeindex from the nsrdb h5 file
`surf_ghi`	Get the surfrad ghi data with negative values as NaN
`surf_time_index`	Get the datetimeindex from the surfrad h5 file
`surfrad`	Get the initialized Surfrad handler

property surfrad: Get the initialized Surfrad handler

property surf_time_index: Get the datetimeindex from the surfrad h5 file

property nsrdb: Get the initialized Resource or MultiFileResource handler

property nsrdb_time_index: Get the datetimeindex from the nsrdb h5 file

property surf_ghi: Get the surfrad ghi data with negative values as NaN

get_rest_inputs()[source]

Get a dataframe of NSRDB variables required to run REST2.

Returns:: rest_inputs (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for each input variable required by REST and some extras (e.g. air_temperature).

run_rest(rest_inputs)[source]

Run REST2 using a dataframe of input data and return clearsky GHI.

Parameters:: rest_inputs (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for each input variable required by REST and some extras (e.g. air_temperature).
Returns:: ghi (np.ndarray) – 2D (time, 1) array of clearsky GHI values calculated by REST2.

get_comparison_df()[source]

Get a timeseries dataframe comparing the ground-measured GHI vs. the clearsky (REST2) GHI at the ground-measurement time_index.

Returns:: df (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for ghi_rest (clearsky), ghi_ground (surfrad), and “clear”, where clear is a boolean (1 for clear) with float dtype so it can have NaN values where ground measurements are missing.

calculate_sky_class(df)[source]

Calculate the sky class (clear, cloudy, broken, missing) from the comparison df.

Parameters:: df (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for ghi_rest (clearsky), ghi_ground (surfrad), and “clear”, where clear is a boolean (1 for clear) with float dtype so it can have NaN values where ground measurements are missing.
Returns:: df (pd.DataFrame) – Same as input but with new column “sky_class” with values (clear, cloudy, broken, missing) calculated from the clear_time_frac and cloudy_time_frac inputs over a time window determined by the window_minutes inputs. Note that sky_class == missing means that it is night or there is missing ground measurement data and validation should not be performed with those timesteps.

add_validation_data(df)[source]: Add NSRDB and SURFRAD ghi and dni data to a DataFrame.

classmethod run(fp_surf, fp_nsrdb, nsrdb_gid, clearsky_ratio=0.9, clear_time_frac=0.8, cloudy_time_frac=0.2, window_minutes=61, min_irradiance=0, sza_lim=89)[source]

Parameters:

fp_surf (str) – Filepath to surfrad h5 file.
fp_nsrdb (str) – Filepath to NSRDB file. can be a MultiFileResource path with: /dir/prefix*suffix.h5
nsrdb_gid (int) – GID (meta data index) for the site of interest in the fp_nsrdb file that matches the fp_surf file.
clearsky_ratio (float) – Clearsky ratio (ground measurement / clearsky irradiance) above which a timestep is considered clear
clear_time_frac (float) – Fraction of clear timesteps in an averaging window above which the whole window is considered clear. Between clear_time_frac and cloudy_time_frac is considered broken clouds.
cloudy_time_frac (float) – Fraction of cloudy timesteps in an averaging window below which the whole window is considered cloudy. Between clear_time_frac and cloudy_time_frac is considered broken clouds.
window_minutes (int) – Minutes that the moving average of the sky classification will be over. This will be calculated while considering the source time resolution of the SURFRAD measurements.
min_irradiance (float | int) – Minimum irradiance value, timesteps with either ground measured or NSRDB irradiance less than this value will be classified as missing.
sza_lim (int | float) – Maximum solar zenith angle, timesteps with sza > sza_lim will be classified as missing

Returns:

df (pd.DataFrame) – Timeseries of validation data from fp_nsrdb and fp_surf including sky classification strings (clear, cloudy, broken, missing) with same datetimeindex as the nsrdb file. Note that sky_class == missing means that it is night or there is missing ground measurement data and validation should not be performed with those timesteps.