Distributed Processing

Dataset creation can be sped up by using distributed processing. Before beginning distributed dataset creation, the user must create worker processes through the Distributed package. A simple way to do this when running distributed processes locally on a single computer is as follows,

# Create worker processes
nproc = 4 # The desired number of CPUs to run with
# Clear any existing worker processes if present
Distributed.nprocs() > 1 && Distributed.rmprocs(Distributed.workers())
# Create worker processes
Distributed.addprocs(nproc - 1; exeflags="--project")
# Import functions used on all worker processes
Distributed.@everywhere using OPFLearn

The create_samples function has a distributed alternative, dist_create_samples.

OPFLearn.dist_create_samples — Function

Loads in PowerModels network data given the name of a network case file, then starts creating samples with distributed processing

source

Creates an AC OPF dataset for the given PowerModels network dictionary. Generates samples until one of the given stopping criteria is met. Takes options to determine how to sample points, what information to save, and what information is printed.

Arguments

'net::Dict': network information stored in a PowerModels.jl format specified dictionary
'K::Integer': the maximum number of samples before stopping sampling
'U::Float': the minimum % of unique active sets sampled in the previous 1 / U samples to continue sampling
'S::Float': the minimum % of saved samples in the previous 1 / L samples to continue sampling
'V::Float': the minimum % of feasible samples that increase the variance of the dataset in the previous 1 / L samples to continue sampling
'T::Integer': the maximum time for the sampler to run in seconds.
'max_iter::Integer': maximum number of iterations for the sampler to run for.
'nproc::Integer': the number of processors for the sampler to run with. Defaults to the number reported by Distributed.nprocs().
'replace_samples::Bool': whether samples in the samples channel are replaced when a new infeasibility certificate is added. Found to sometimes block progress when turned on.
'sampler::Function': the sampling function to use. This function must take arguements A and b, and can take optional arguments.
'sampler_opts::Dict': a dictionary of optional arguments to pass to the sampler function.
'A::Array': defines the initial sampling space polytope Ax<=b. If not provided, initializes to a default.
'b::Array': defines the initial sampling space polytope Ax<=b. If not provided, initializes to a default.
'pd_max::Array': the maximum active load values to use when initializing the sampling space and constraining the loads. If nothing, finds the maximum load at each bus with the given relaxed model type.
'pd_min::Array': the minimum active load values to use when initializing the sampling space and constraining the loads. If nothing, this is set to 0 for all loads.
'pf_min::Array/Float:' the minimum power factor for all loads in the system (Number) or an array of minimum power factors for each load in the system.
'pf_lagging::Bool': indicating if load power factors can be only lagging (True), or both lagging or leading (False).
'reset_level::Integer': determines how to reset the load point to be inside the polytope before sampling. 2: Reset closer to nominal load & chebyshev center, 1: Reset closer to chebyshev center, 0: Reset at chebyshev center.
'save_certs::Bool': specifies whether the sampling space, Ax<=b (A & b matrices) are saved to the results dictionary.
'save_max_load::Bool': specifies whether the max active load demands used are saved to the results dictionary.
'model_type::Type': an abstract PowerModels type indicating the network model to use for the relaxed AC-OPF formulations (Max Load & Nearest Feasible)
'r_solver': an optimizer constructor used for solving the relaxed AC-OPF optimization problems.
'opf_solver': an optimizer constructor used to find the AC-OPF optimal solution for each sample.
'print_level::Integer': from 0 to 3 indicating the level of info to print to console, with 0 indicating minimum printing.
'stat_track::Integer': from 0 to 3 indicating the level of stats info saved during each iteration 0: No information saved, 1: Feasibility, New Certificate, Added Sample, Iteration Time, 2: Variance for all input & output variables
'save_while::Bool': indicates whether results and stats information is saved to a csv file during processing.
'save_infeasible::Bool': indicates if infeasible samples are saved. If true saves infeasible samples in a seperate file from feasible samples.
'save_path::String:' a string with the file path to the desired result save location.
'net_path::String': a string with the file path to the network file.
'variance::Bool': indicates if dataset variance information is tracked for each unique active set.
'discard::Bool': indicates if samples that do not increase the variance within a unique active set are discarded.

See 'OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets' for more information on how the AC OPF datasets are created.

Modified from AgenerateACOPFsamples.m written by Ahmed S. Zamzam

source

The distributed sample creation function has the same arguments as the single process function, except for the addition of two arguments: nproc and replace_samples.

nproc allows the user to specify the number of processor to run the distributed sample creation with.
replace_samples specifies whether when a new infeasibility certificate is found if the samples in the sample queue are replaced.

Warn

The replace samples option has not been fully tested/debugged and may cause the script to freeze.

Distributed processing splits the sampling/result handling from the sample processing with one processor handling sampling and the remaining processors processing samples.

Note

A significant increase in speed is not seen unless more than 3 processors are used. On the other hand, specifying more processors than are available may result in an error when loading OPFLearn on distributed processes.