Adversarial IRL on N/W reconfiguration¶
Classes and Functions¶
- phy_train_airl.evaluate_policy(env, model)¶
Evaluating the learned policy by running experiments in the environment
- Parameters
env (Gym.Env) – The Open DSS RL environment
model (torch.nn.Module) – Trained policy network model
- Returns
average episode length, average reward
- Return type
float
- phy_train_airl.sample_input_for_reward_network(env)¶
Sample an input to save the reward network
- Parameters
env (Gym.Env) – The Open DSS RL environment
- Returns
returns the state,input,next_state,done concacted to be fed to the reward network
- Return type
torch
- phy_train_airl.tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) Tensor ¶
Constructs a tensor with
data
.Warning
torch.tensor()
always copiesdata
. If you have a Tensordata
and want to avoid a copy, usetorch.Tensor.requires_grad_()
ortorch.Tensor.detach()
. If you have a NumPyndarray
and want to avoid a copy, usetorch.as_tensor()
.Warning
When data is a tensor x,
torch.tensor()
reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Thereforetorch.tensor(x)
is equivalent tox.clone().detach()
andtorch.tensor(x, requires_grad=True)
is equivalent tox.clone().detach().requires_grad_(True)
. The equivalents usingclone()
anddetach()
are recommended.- Args:
- data (array_like): Initial data for the tensor. Can be a list, tuple,
NumPy
ndarray
, scalar, and other types.
- Keyword args:
- dtype (
torch.dtype
, optional): the desired data type of returned tensor. Default: if
None
, infers data type fromdata
.- device (
torch.device
, optional): the desired device of returned tensor. Default: if
None
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.- requires_grad (bool, optional): If autograd should record operations on the
returned tensor. Default:
False
.- pin_memory (bool, optional): If set, returned tensor would be allocated in
the pinned memory. Works only for CPU tensors. Default:
False
.
- dtype (
Example:
>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]]) tensor([[ 0.1000, 1.2000], [ 2.2000, 3.1000], [ 4.9000, 5.2000]]) >>> torch.tensor([0, 1]) # Type inference on data tensor([ 0, 1]) >>> torch.tensor([[0.11111, 0.222222, 0.3333333]], ... dtype=torch.float64, ... device=torch.device('cuda:0')) # creates a torch.cuda.DoubleTensor tensor([[ 0.1111, 0.2222, 0.3333]], dtype=torch.float64, device='cuda:0') >>> torch.tensor(3.14159) # Create a scalar (zero-dimensional tensor) tensor(3.1416) >>> torch.tensor([]) # Create an empty tensor (of size (0,)) tensor([])
- phy_train_airl.train_and_evaluate(env, airl_train_lens, policy_net_train_len, exp_trajectory_len)¶
For different combination of AIRL training length evaluate the learned policy and saves the reward network.
- Parameters
env (Gym.Env) – The Open DSS RL environment
exp_tajectory_len (int) – The number of expert demonstrations steps considered for AIRL training
policy_net_train_len (int) – Samples considered for training the policy network fed in the generator network as initial policy
airl_train_lens (list) – List of combination of AIRL training length evaluated
- Returns
Nothing
- Return type
None