Deep Q Network Learning¶

Classes and Functions¶

class dqn.Agent(state_size, action_size, seed)¶

Bases: object

Interacts with and learns form environment.

act(state, eps=0)¶: Returns action for given state as per current policy Params =======

state (array_like): current state eps (float): epsilon, for epsilon-greedy action selection

learn(experiences, gamma)¶: Update value parameters using given batch of experience tuples. Params =======

experiences (Tuple[torch.Variable]): tuple of (s, a, r, s’, done) tuples gamma (float): discount factor

soft_update(local_model, target_model, tau)¶: Soft update model parameters. θ_target = τ*θ_local + (1 - τ)*θ_target Params =======

local model (PyTorch model): weights will be copied from target model (PyTorch model): weights will be copied to tau (float): interpolation parameter

step(state, action, reward, next_step, done)¶

class dqn.QNetwork(state_size, action_size, seed, fc1_unit=64, fc2_unit=64)¶

Bases: torch.nn.modules.module.Module

Actor (Policy) Model.

forward(x)¶: Build a network that maps state -> action values.

training: bool¶

class dqn.ReplayBuffer(action_size, buffer_size, batch_size, seed)¶

Bases: object

Fixed -size buffe to store experience tuples.

add(state, action, reward, next_state, done)¶: Add a new experience to memory.

sample()¶: Randomly sample a batch of experiences from memory