dwind.utils

dwind.utils#

Modules

`array`	Provides a series of generic NumPy and Pandas utility functions.
`hpc`	Provides the live timing table functionalities for the Kestrel `MultiProcess` class.
`loader`	Provides the core data loading methods for importing scenario data from flat files or SQL.

Array#

Provides a series of generic NumPy and Pandas utility functions.

Functions

`memory_downcaster`(df)	Downcasts `int` and `float` columns to the lowest memory alternative possible.
`split_by_index`(arr, n_splits)	Split a DataFrame, Series, or array like with np.array_split, but only return the start and stop indices, rather than chunks.

dwind.utils.array.memory_downcaster(df)[source]#

Downcasts int and float columns to the lowest memory alternative possible. For integers this means converting to either signed or unsigned 8-, 16-, 32-, or 64-bit integers, and for floats, converting to np.float32.

Parameters:: df (pd.DataFrame | pd.Series) – DataFrame or Series to have its memory footprint reduced.
Returns:: Reduced footprint version of the passed df.
Return type:: pd.DataFrame | pd.Series

dwind.utils.array.split_by_index(arr, n_splits)[source]#

Split a DataFrame, Series, or array like with np.array_split, but only return the start and stop indices, rather than chunks. For Pandas objects, this are equivalent to arr.iloc[start: end] and for NumPy: arr[start: end]. Splits are done according to the 0th dimension.

Parameters:

arr (pd.DataFrame | pd.Series | np.ndarray) – The array, data frame, or series to split.
n_splits (int) – The number of near equal or equal splits.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple[np.ndarray, np.ndarray]

HPC#

Provides the live timing table functionalities for the Kestrel MultiProcess class.

Functions

`convert_seconds_for_print`(time)	Convert number of seconds to number of hours, minutes, and seconds.
`generate_run_status_table`(job_status)	Generate the job status run time statistics table.
`get_finished_run_status`(jobs)	Extracts a dictionary of job_id and status from the `sacct` output for a single job or series of jobs.
`update_status`(job_status)	Get an updated status and timing statistics for all running jobs on the HPC.

dwind.utils.hpc.convert_seconds_for_print(time)[source]#

Convert number of seconds to number of hours, minutes, and seconds.

Return type:: str

dwind.utils.hpc.generate_run_status_table(job_status)[source]#

Generate the job status run time statistics table.

Parameters:: job_status (dict) – Dictionary of job id (primary key) with sub keys of “status”, “start_time” (initial or start of run status), “wait”, and “run”.
Returns:: rich.Table of human readable statistics. bool: True if all jobs are complete, otherwise False.
Return type:: Table

dwind.utils.hpc.get_finished_run_status(jobs)[source]#

Extracts a dictionary of job_id and status from the sacct output for a single job or series of jobs.

Parameters:: jobs (int | str | list[int | str]) – Single job ID or list of job IDs that have finished running.
Returns:: Dictionary of {job_id_1: status_1, …, job_id_N: status_N}.
Return type:: dict[str, str]

dwind.utils.hpc.update_status(job_status)[source]#

Get an updated status and timing statistics for all running jobs on the HPC.

Parameters:

job_status (dict) – Dictionary of job id (primary key) with sub keys of “status”, “start_time” (initial or start of run status), “wait”, and “run”.

Returns:

Dictionary of updated statuses and timing statistics for all current queued and: running jobs.

Return type:

dict

Loader#

Provides the core data loading methods for importing scenario data from flat files or SQL.

Functions

load_df(file_or_table, year[, sql_constructor])

Loads data from either a SQL table or file to a pandas DataFrame.

dwind.utils.loader.load_df(file_or_table, year, sql_constructor=None)[source]#

Loads data from either a SQL table or file to a pandas DataFrame.

Parameters:

file_or_table (str | Path) – File name or path object, or SQL table where the data are located.
year (dwind.config.Year, optional) – If used, only extracts the single year from
None. (a column called "year". Defaults to)
sql_constructor (str | None, optional) – The SQL engine constructor string. Required if extracting from SQL. Defaults to None.

dwind.utils

Contents

dwind.utils#

Array#

HPC#

Loader#