nsrdb

NSRDB command line interface.

Try using the following commands to pull up the help pages for the respective NSRDB CLIs:

$ python -m nsrdb.cli --help

$ python -m nsrdb.cli create-configs --help

$ python -m nsrdb.cli pipeline --help

$ python -m nsrdb.cli data-model --help

$ python -m nsrdb.cli ml-cloud-fill --help

$ python -m nsrdb.cli daily-all-sky --help

$ python -m nsrdb.cli collect-data-model --help

$ python -m nsrdb.cli tmy --help

$ python -m nsrdb.cli blend --help

$ python -m nsrdb.cli aggregate --help

Each of these commands can be run with a config_file or a dictionary (represented as a string) provided through the -c argument. A typical config file might look like:


{
    "logging": {"log_level": "DEBUG"},
    "<command name>": {'run_name': ...,
                       **kwargs},
    "direct": {more kwargs},
    "execution_control": {"option": "kestrel", ...}
    "another command": {...},
    ...
    ]
}

The “run_name” key will be prepended to each kicked off job. e.g. <run_name>_0, <run_name>_1, … for multiple jobs from the same cli module. The “direct” key is used to provide arguments to multiple commands. This removes the need for duplication in the case of multiple commands having the same argument values. “execution_control” is used to provide arguments to the SLURM manager for HPC submissions or to select local execution with {“option”: “local”}

To do a standard CONUS / Full Disc run use the following commands:

$ CONFIG='{"year": <year>, "out_dir": <out_dir>}'

$ python -m nsrdb.cli create-configs -c ${CONFIG}

$ cd <out_dir>

$ bash run.sh (run this until all main steps are complete)

$ cd post_proc

$ bash run.sh (run this until all post-proc steps are complete)

See the help pages of the module CLIs for more details on the config files for each CLI.

nsrdb [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

-c, --config <config>

NSRDB config file json or dict for a single module.

-v, --verbose

Flag to turn on debug logging. Default is False.

aggregate

Aggregate data files to a lower resolution.

NOTE: Used to create data files from high-resolution years (2018+) which match resolution of low-resolution years (pre 2018)

nsrdb aggregate [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for NSRDB.aggregate_files()

-v, --verbose

Flag to turn on debug logging. Default is False.

all-sky

Run all-sky physics model on collected data model output files.

nsrdb all-sky [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.run_all_sky()

-v, --verbose

Flag to turn on debug logging. Default is False.

batch

Create and run multiple NSRDB project directories based on batch permutation logic.

The NSRDB batch module (built on the gaps batch functionality) is a way to create and run many NSRDB pipeline projects based on permutations of key-value pairs in the run config files. A user configures the batch file by creating one or more “sets” that contain one or more arguments (keys found in config files) that are to be parameterized. For example, in the config below, two NSRDB pipelines will be created where year is set to 2020 and 2021 in config_nsrdb.json:


{
    "pipeline_config": "./config_pipeline.json",
    "sets": [
      {
        "args": {
          "year": [2020, 2021],
        },
        "files": ["./config_nsrdb.json"],
        "set_tag": "set1"
      }
}

Run the batch module with:

$ python -m nsrdb.cli -c config_batch.json batch

Note that you can use multiple “sets” to isolate parameter permutations.

nsrdb batch [OPTIONS] COMMAND [ARGS]...

Options

-c, --config <config>

Required NSRDB batch configuration json or csv file.

--dry-run

Flag to do a dry run (make batch dirs without running).

--cancel

Flag to cancel all jobs associated with a given pipeline.

--delete

Flag to delete all batch job sub directories associated with the batch_jobs.csv in the current batch config directory.

--monitor-background

Flag to monitor all batch pipelines continuously in the background using the nohup command. Note that the stdout/stderr will not be captured, but you can set a pipeline “log_file” to capture logs.

-v, --verbose

Flag to turn on debug logging. Default is False.

blend

Blend files from separate domains (e.g. east / west) into a single domain.

nsrdb blend [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for NSRDB.blend_files()

-v, --verbose

Flag to turn on debug logging. Default is False.

cloud-fill

Gap fill cloud properties in a collect-data-model output file, using legacy gap-fill method.

nsrdb cloud-fill [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.cloud_fill()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-aggregate

Collect aggregate data files into a single file with multiple datasets.

nsrdb collect-aggregate [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for Collector.collect_dir()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-blend

Collect blended files into a single file with multiple datasets.

nsrdb collect-blend [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for Collector.collect_dir()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-daily

Collect daily files into a final file.

nsrdb collect-daily [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.collect_daily()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-data-model

Collect data-model output files to a single site-chunked output file.

You would call the nsrdb collect-data-model module using:

$ python -m nsrdb.cli -c config.json collect-data-model

A typical config.json file might look like this:


{
    "collect-data-model": {
        "final": true,
        "max_workers": 10,
        "n_chunks": 1,
        "memory": 178,
        "n_writes": 1,
        "walltime": 48
    },
    "daily-all-sky": {...},
    "data-model": {...},
    "direct": {...},
    "execution_control": {
        "option": "kestrel",
        "alloc": "pxs",
        "feature": "--qos=normal",
        "walltime": 40
    },
    "ml-cloud-fill": {...}
}
nsrdb collect-data-model [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.collect_data_model()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-final

Collect chunked files with final data into final full files.

nsrdb collect-final [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.collect_final()

-v, --verbose

Flag to turn on debug logging. Default is False.

collect-tmy

Collect the previously generated tmy file chunks.

You would call the nsrdb collect-tmy module using:

$ python -m nsrdb.cli -c config.json collect-tmy

A typical config.json file might look like this:


{
    "tmy": {},
    "collect-tmy": {"purge_chunks": True},
    "direct": {
        "sites_per_worker": 50,
        "site_slice": [0, 100],
        "tmy_types": ['tmy', 'tdy', 'tgy'],
        "nsrdb_base_fp": './nsrdb_*_{}.h5',
        "years": [2000, ..., 2022],
        "out_dir": './",
        "fn_out": 'tmy_2000_2022.h5'
    }
}
nsrdb collect-tmy [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for TmyRunner.collect()

-v, --verbose

Flag to turn on debug logging. Default is False.

create-configs

Create config files for standard NSRDB runs using config templates.

To generate all full_disc / conus run directories for east / west regions, each with main routine config files contained run the following:

$ CONFIG='{"year": 2020, "out_dir": "./"}'

$ python -m nsrdb.cli create-configs –run_type full -c ${CONFIG}

Additionally, conus / full_disc blend configs, aggregation config, collection config, and a post processing pipeline config with all these steps will be written to a “post_proc” directory so that post-processing can be run simply with:

$ python -m nsrdb.cli pipeline -c config_pipeline_post.json
nsrdb create-configs [OPTIONS]

Options

-c, --config <config>

Required Either a path to a .json config file or a dictionary. Needs to include at least a “year” key. If input is a dictionary the dictionary needs to provided in a string format:

$ '{"year": 2019, "freq": "5min"}'
Available keys:
year (year to run),
freq (target time step. e.g. “5min”),
out_dir (parent directory for run directory),
satellite (east/west),
spatial (meta file resolution, e.g. “2km” or “4km”),
extent (full/conus),
basename (string to prepend to files and job names),
meta_file (e.g. “surfrad_meta.csv”. Auto populated if None.),
doy_range (All days of year if None).
default_kwargs = {
“basename”: “nsrdb”,
“freq”: “5min”,
“satellite”: “east”,
“extent”: “conus”,
“out_dir”: “./”,
“spatial”: “4km”,
“meta_file” : None,
“doy_range”: None
}
-r, --run_type <run_type>

Run type to create configs for. Can be “surfrad” (just writes a single template config with any provided kwargs replaced, with a surfrad meta file), “full” (generates all config and pipline files for the given year, including all domain main runs, blending, aggregation, and collection), or “main” (for standard run without post-processing, with data-model, ml-cloud-fill, all-sky, and collect-data-model), “aggregate” (for aggregating post-2018 data to pre-2018 resolution), “blend” (for blending east and west domains into a single domain), or “post” (for all blending / aggregation / collection for a given year)

-ad, --all_domains

Flag to generate config files for all domains. If True config files for east/west and conus/full will be generated. (just full if year is < 2018). satellite, extent, spatial, freq, and meta_file will be auto populated.

-col, --collect

Flag to generate config files for module collection. This applies to run_type = “aggregate” or “blend”.

daily-all-sky

Run all-sky physics model on daily data-model output.

You would call the nsrdb daily-all-sky module using:

$ python -m nsrdb.cli -c config.json daily-all-sky

A typical config.json file might look like this:


{
    "collect-data-model": {...},
    "daily-all-sky": {
        "disc_on": false,
        "out_dir": "./all_sky",
        "year": 2018,
        "grid": "/projects/pxs/reference_grids/surfrad_meta.csv",
        "freq": "5min"
    },
    "data-model": {...},
    "direct": {...},
    "execution_control": {
        "option": "kestrel",
        "alloc": "pxs",
        "feature": "--qos=normal",
        "walltime": 40
    },
    "ml-cloud-fill": {...}
}
nsrdb daily-all-sky [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.run_daily_all_sky()

-v, --verbose

Flag to turn on debug logging. Default is False.

data-model

Run daily data-model and save output files.

You would call the nsrdb data-model module using:

$ python -m nsrdb.cli -c config.json data-model

A typical config.json file might look like this:


{
    "collect-data-model": {...},
    "daily-all-sky": {...},
    "data-model": {
        "dist_lim": 2.0,
        "doy_range": [1, 367],
        "factory_kwargs": {
          "cld_opd_dcomp": ...
          "cld_press_acha": ...
          "cld_reff_dcomp": ...
          "cloud_fraction": ...
          "cloud_probability": ...
          "cloud_type": ...
          "refl_0_65um_nom": ...
          "refl_0_65um_nom_stddev_3x3": ...
          "refl_3_75um_nom": ...
          "surface_albedo": ...
          "temp_11_0um_nom": ...
          "temp_11_0um_nom_stddev_3x3": ...
          "temp_3_75um_nom": ...
        },
        "max_workers": null,
        "max_workers_regrid": 16,
        "mlclouds": true
    },
    "direct": {
        "log_level": "INFO",
        "name": ...
        "freq": "5min"
        "grid": "/projects/pxs/reference_grids/surfrad_meta.csv,
        "out_dir": "./",
        "max_workers": 32,
        "year": "2018"
    },
    "execution_control": {
        "option": "kestrel",
        "alloc": "pxs",
        "feature": "--qos=normal",
        "walltime": 40
    },
    "ml-cloud-fill": {...}
}

See the other CLI help pages for what the respective module configs require.

nsrdb data-model [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict of kwargs for NSRDB.run_data_model()

-v, --verbose

Flag to turn on debug logging. Default is False.

ml-cloud-fill

Gap fill cloud properties using mlclouds.

You would call the nsrdb ml-cloud-fill module using:

$ python -m nsrdb.cli -c config.json ml-cloud-fill

A typical config.json file might look like this:


{
    "collect-data-model": {...},
    "daily-all-sky": {...},
    "data-model": {...},
    "direct": {...},
    "execution_control": {
        "option": "kestrel",
        "alloc": "pxs",
        "feature": "--qos=normal",
        "walltime": 40
    },
    "ml-cloud-fill": {
        "col_chunk": 10000,
        "fill_all": false,
        "max_workers": 4
        "model_path": ...
    }
}

See the other CLI help pages for what the respective module configs require.

nsrdb ml-cloud-fill [OPTIONS]

Options

-c, --config <config>

Required Path to config file or dict with kwargs for NSRDB.ml_cloud_fill()

-v, --verbose

Flag to turn on debug logging. Default is False.

pipeline

Execute multiple steps in an NSRDB pipeline.

Typically, a good place to start is to set up an nsrdb job with a pipeline config that points to several NSRDB modules that you want to run in serial. You would call the nsrdb pipeline CLI using:

$ python -m nsrdb.cli -c config_pipeline.json pipeline

A typical nsrdb pipeline config.json file might look like this:


{
    "logging": {"log_level": "DEBUG"},
    "pipeline": [
        {"data-model": "./config_nsrdb.json"},
        {"ml-cloud-fill": "./config_nsrdb.json"},
        {"daily-all-sky": "./config_nsrdb.json"},
        {"collect-data-model": "./config_nsrdb.json"},
    ]
}

See the other CLI help pages for what the respective module configs require.

nsrdb pipeline [OPTIONS] COMMAND [ARGS]...

Options

-c, --config <config>

Required NSRDB pipeline configuration json file.

--cancel

Flag to cancel all jobs associated with a given pipeline.

--monitor

Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).

--background

Flag to monitor pipeline jobs continuously in the background using the nohup command. This only works with the –monitor flag. Note that the stdout/stderr will not be captured, but you can set a pipeline log_file to capture logs.

-v, --verbose

Flag to turn on debug logging. Default is False.

tmy

Create tmy files for given input files.

You would call the nsrdb tmy module using:

$ python -m nsrdb.cli -c config.json tmy

A typical config.json file might look like this:


{
    "tmy": {},
    "collect-tmy": {"purge_chunks": True},
    "direct": {
        "sites_per_worker": 50,
        "site_slice": [0, 100],
        "tmy_types": ['tmy', 'tdy', 'tgy'],
        "nsrdb_base_fp": './nsrdb_*_{}.h5',
        "years": [2000, ..., 2022],
        "out_dir": './",
        "fn_out": 'tmy_2000_2022.h5'
    }
}
nsrdb tmy [OPTIONS]

Options

-c, --config <config>

Required Path to config file with kwargs for TmyRunner.tmy()

-v, --verbose

Flag to turn on debug logging. Default is False.