nsrdb

NSRDB processing CLI.

nsrdb [OPTIONS] COMMAND [ARGS]...

aggregate

NSRDB data aggregation.

nsrdb aggregate [OPTIONS]

Options

-kw, --kwargs <kwargs>

Required Argument dictionary. Needs to include year. e.g. ‘{“year”:2019}’.

Available keys: year, basename (file prefix), outdir (parent directory of data directories), metadir (directory with meta file), full_spatial, conus_spatial, final_spatial (spatial resolution for each domain), full_freq, conus_freq, final_freq (temporal resolution for each domain), n_chunks (number of chunks to process the meta data in), alloc (project allocation code), memory (node memory), walltime (time for job).

default_kwargs = {“basename”: “nsrdb”, “basename”: “nsrdb”, “metadir”: “/projects/pxs/reference_grids”, “full_spatial”: “2km”, “conus_spatial”: “2km”, “final_spatial”: “4km”, “outdir”: “./”, “full_freq”: “10min”, “conus_freq”: “5min”, “final_freq”: “30min”, “n_chunks”: 32, “alloc”: “pxs”, “memory”: 90, “walltime”: 40, “stdout”: “./”}

--collect

Flag to collect aggregation chunks.

--hpc

Flag to run collection on HPC.

blend

NSRDB data blend.

nsrdb blend [OPTIONS]

Options

-kw, --kwargs <kwargs>

Required Argument dictionary. Needs to include year. e.g. ‘{“year”:2019, “extent”: “full”}’.

Available keys: year, outdir (parent directory of data directories), file_tag (“ancillary_a”, “ancillary_b”, “clearsky”, “clouds”, “csp”, “irradiance”, “pv”, “all”) - If file_tag is all then all other tags will be run, spatial (meta file resolution), extent (full/conus), basename (file prefix), east_dir (directory with east data, auto populated if None), west_dir (directory with west data, auto populated if None), metadir (directory with meta file), meta_file. (auto populated if None), alloc (project allocation code), memory (node memory), chunk_size (number of sites to read/write at a time), walltime (time for job).

default_kwargs = {“file_tag”: “all”, “basename”: “nsrdb”, “extent”: “conus”, “outdir”: “./”, “east_dir”: None, “west_dir”: None, “metadir”: “/projects/pxs/reference_grids”, “spatial”: “2km”, “meta_file” : None, “alloc”: “pxs”, “walltime”: 48, “chunk_size”: 100000, “memory”: 83, “stdout”: “./”}

--collect

Flag to collect blended data files.

--hpc

Flag to run collection on HPC.

config

NSRDB processing CLI from config json file.

nsrdb config [OPTIONS]

Options

-c, --config_file <config_file>

Required Filepath to config file.

-cmd, --command <command>

Required NSRDB CLI command string.

create-configs

NSRDB config file creation from templates.

nsrdb create-configs [OPTIONS]

Options

-kw, --kwargs <kwargs>

Required Argument dictionary. Needs to include year. Needs to be in a string format. e.g. ‘{“year”:2019, “freq”:”5min”}’.

Available keys: year, freq, outdir (parent directory for run directory), satellite (east/west), spatial (meta file resolution), extent (full/conus), basename (file prefix), meta_file. (auto populated if None), doy_range (all days of year if None).

default_kwargs = {“basename”: “nsrdb”, “freq”: “5min”, “satellite”: “east”, “extent”: “conus”, “outdir”: “./”, “spatial”: “4km”, “meta_file” : None, “doy_range”: None}

-all_domains, -ad

Flag to generate config files for all domains. If True config files for east/west and conus/full will be generated. (just full if year is < 2018). satellite, extent, spatial, freq, and meta_file will be auto populated.

direct

NSRDB direct processing CLI (no config file).

nsrdb direct [OPTIONS] COMMAND [ARGS]...

Options

-n, --name <name>

Job and node name.

-y, --year <year>

Year of analysis.

-g, --nsrdb_grid <nsrdb_grid>

File path to NSRDB meta data grid.

-f, --nsrdb_freq <nsrdb_freq>

NSRDB frequency (e.g. “5min”, “30min”).

-vm, --var_meta <var_meta>

CSV file or dataframe containing meta data for all NSRDB variables. Defaults to the NSRDB var meta csv in git repo.

-od, --out_dir <out_dir>

Required Output directory.

-v, --verbose

Flag to turn on debug logging. Default is not verbose.

all-sky

Run allsky for a single chunked file

nsrdb direct all-sky [OPTIONS] COMMAND [ARGS]...

Options

-i, --i_chunk <i_chunk>

Required Chunked file index in out_dir to run allsky for.

-ch, --col_chunk <col_chunk>

Required Chunking method to run all sky one column chunk at a time to reduce memory requirements. This is an integer specifying how many columns to work on at one time.

-do, --disc_on <disc_on>

Whether to run compute cloudy sky dni with the disc model (True) or the farms-dni model (False).

hpc

HPC submission tool for the NSRDB cli.

nsrdb direct all-sky hpc [OPTIONS]

Options

-a, --alloc <alloc>

Required HPC allocation account name.

-mem, --memory <memory>

HPC node memory request in GB. Default is None

-wt, --walltime <walltime>

HPC walltime request in hours. Default is 1.0

-l, --feature <feature>

Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

-sout, --stdout_path <stdout_path>

Subprocess standard output path. Default is in out_dir.

cloud-fill

Gap fill a cloud data file.

nsrdb direct cloud-fill [OPTIONS] COMMAND [ARGS]...

Options

-i, --i_chunk <i_chunk>

Required Chunked file index in out_dir to run cloud fill for.

-ch, --col_chunk <col_chunk>

Required Optional chunking method to gap fill one column chunk at a time to reduce memory requirements. If provided, this should be an int specifying how many columns to work on at one time.

hpc

HPC submission tool for the NSRDB cli.

nsrdb direct cloud-fill hpc [OPTIONS]

Options

-a, --alloc <alloc>

Required HPC allocation account name.

-mem, --memory <memory>

HPC node memory request in GB. Default is None

-wt, --walltime <walltime>

HPC walltime request in hours. Default is 1.0

-l, --feature <feature>

Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

-sout, --stdout_path <stdout_path>

Subprocess standard output path. Default is in out_dir.

collect-daily

Run the NSRDB file collection method on a specific daily directory for specific datasets to a single output file.

nsrdb direct collect-daily [OPTIONS] COMMAND [ARGS]...

Options

-cd, --collect_dir <collect_dir>

Required Directory containing chunked files to collect from.

-fo, --fn_out <fn_out>

Required Output filename to be saved in out_dir.

-ds, --dsets <dsets>

Required List of dataset names to collect.

-nw, --n_writes <n_writes>

Number of file list divisions to write per dataset. For example, if ghi and dni are being collected and n_writes is set to 2, half of the source ghi files will be collected at once and then written, then the second half of ghi files, then dni.

-w, --max_workers <max_workers>

Number of parallel workers to use.

-e, --hpc

Flag for that this is being used to pass commands to an hpc call.

collect-data-model

Collect data model results into cohesive timseries file chunks.

nsrdb direct collect-data-model [OPTIONS] COMMAND [ARGS]...

Options

-n, --n_chunks <n_chunks>

Required Number of chunks to collect into.

-ic, --i_chunk <i_chunk>

Required Chunk index.

-if, --i_fname <i_fname>

Required Filename index: 0: ancillary_a, 1: ancillary_b, 2: clearsky, 3: clouds, 4: csp, 5: irrad, 6: pv.

-nw, --n_writes <n_writes>

Number of file list divisions to write per dataset. For example, if ghi and dni are being collected and n_writes is set to 2, half of the source ghi files will be collected at once and then written, then the second half of ghi files, then dni.

-w, --max_workers <max_workers>

Number of parallel workers to use.

-f, --final

Flag for final collection. Will put collected files in the final directory instead of in the collect directory.

-pn, --final_file_name <final_file_name>

Final file name for filename outputs if this is the terminal job. None will default to the name in ctx which is usually the slurm job name.

hpc

HPC submission tool for the NSRDB cli.

nsrdb direct collect-data-model hpc [OPTIONS]

Options

-a, --alloc <alloc>

Required HPC allocation account name.

-mem, --memory <memory>

HPC node memory request in GB. Default is None

-wt, --walltime <walltime>

HPC walltime request in hours. Default is 1.0

-l, --feature <feature>

Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

-sout, --stdout_path <stdout_path>

Subprocess standard output path. Default is in out_dir.

collect-final

Collect chunked files with final data into final full files.

nsrdb direct collect-final [OPTIONS] COMMAND [ARGS]...

Options

-d, --collect_dir <collect_dir>

Required Chunked directory to collect to out_dir.

-if, --i_fname <i_fname>

Required Filename index (0: ancillary, 1: clouds, 2: irrad, 3: sam vars).

hpc

HPC submission tool for the NSRDB cli.

nsrdb direct collect-final hpc [OPTIONS]

Options

-a, --alloc <alloc>

Required HPC allocation account name.

-mem, --memory <memory>

HPC node memory request in GB. Default is None

-wt, --walltime <walltime>

HPC walltime request in hours. Default is 1.0

-l, --feature <feature>

Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

-sout, --stdout_path <stdout_path>

Subprocess standard output path. Default is in out_dir.

collect-flist

Run the NSRDB file collection method with explicitly defined flist.

nsrdb direct collect-flist [OPTIONS] COMMAND [ARGS]...

Options

-fl, --flist <flist>

Required Explicit list of filenames in collect_dir to collect. Using this option will superscede the default behavior of collecting daily data model outputs in collect_dir.

-cd, --collect_dir <collect_dir>

Required Directory containing chunked files to collect from.

-fo, --fn_out <fn_out>

Required Output filename to be saved in out_dir.

-ds, --dsets <dsets>

Required List of dataset names to collect.

-e, --hpc

Flag for that this is being used to pass commands to an hpc call.

daily-all-sky

Run allsky for a single day using daily data model output files as source data

nsrdb direct daily-all-sky [OPTIONS] COMMAND [ARGS]...

Options

-d, --date <date>

Required Single day data model output to run cloud fill on.Must be str in YYYYMMDD format.

-ch, --col_chunk <col_chunk>

Required Chunking method to run all sky one column chunk at a time to reduce memory requirements. This is an integer specifying how many columns to work on at one time.

-do, --disc_on <disc_on>

Whether to run compute cloudy sky dni with the disc model (True) or the farms-dni model (False).

data-model

Run the data model for a single day.

nsrdb direct data-model [OPTIONS] COMMAND [ARGS]...

Options

-d, --doy <doy>

Required Integer day-of-year to run data model for.

-vl, --var_list <var_list>

Variables to process with the data model. None will default to all NSRDB variables.

-dl, --dist_lim <dist_lim>

Required Return only neighbors within this distance during cloud regrid. The distance is in decimal degrees (more efficient than real distance). NSRDB sites further than this value from GOES data pixels will be warned and given missing cloud types and properties resulting in a full clearsky timeseries.

-kw, --factory_kwargs <factory_kwargs>

Optional namespace of kwargs to use to initialize variable data handlers from the data models variable factory. Keyed by variable name. Values can be “source_dir”, “handler”, etc… source_dir for cloud variables can be a normal directory path or /directory/prefix*suffix where /directory/ can have more sub dirs.

-w, --max_workers <max_workers>

Number of workers to use in parallel.

-mwr, --max_workers_regrid <max_workers_regrid>

Number of workers to use in parallel for the cloud regrid algorithm.

-ml, --mlclouds

Flag to process additional variables if mlclouds gap fillis going to be run after the data_model step.

hpc

HPC submission tool for the NSRDB cli.

nsrdb direct data-model hpc [OPTIONS]

Options

-a, --alloc <alloc>

Required HPC allocation account name.

-mem, --memory <memory>

HPC node memory request in GB. Default is None

-wt, --walltime <walltime>

HPC walltime request in hours. Default is 1.0

-l, --feature <feature>

Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

-sout, --stdout_path <stdout_path>

Subprocess standard output path. Default is in out_dir.

ml-cloud-fill

Gap fill cloud properties in daily data model outputs using a physics guided neural network (phgynn).

nsrdb direct ml-cloud-fill [OPTIONS] COMMAND [ARGS]...

Options

-d, --date <date>

Required Single day data model output to run cloud fill on.Must be str in YYYYMMDD format.

-all, --fill_all <fill_all>

Flag to fill all cloud properties for all timesteps where cloud_type is cloudy.

-mp, --model_path <model_path>

Directory to load phygnn model from. This is typically a fpath to a .pkl file with an accompanying .json file in the same directory.

-ch, --col_chunk <col_chunk>

Required Optional chunking method to gap fill one column chunk at a time to reduce memory requirements. If provided, this should be an int specifying how many columns to work on at one time.

-mw, --max_workers <max_workers>

Required Maximum workers to clean data in parallel. 1 is serial and None uses all available workers.

pipeline

NSRDB pipeline from a pipeline config file.

nsrdb pipeline [OPTIONS]

Options

-c, --config_file <config_file>

Required NSRDB pipeline configuration json file.

--cancel

Flag to cancel all jobs associated with a given pipeline.

--monitor

Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).