types_unique module¶

Types and helper methods for transitions and trajectories. Code adopted from https://github.com/HumanCompatibleAI/imitation.git

class types_unique.Trajectory(obs: numpy.ndarray, acts: numpy.ndarray, infos: Optional[numpy.ndarray], terminal: bool)¶

Bases: object

A trajectory, e.g. a one episode rollout from an expert policy.

acts: numpy.ndarray¶: Actions, shape (trajectory_len, ) + action_shape.

infos: Optional[numpy.ndarray]¶: An array of info dicts, length trajectory_len.

obs: numpy.ndarray¶: Observations, shape (trajectory_len + 1, ) + observation_shape.

terminal: bool¶

Does this trajectory (fragment) end in a terminal state?

Episodes are always terminal. Trajectory fragments are also terminal when they contain the final state of an episode (even if missing the start of the episode).

class types_unique.TrajectoryWithRew(obs: numpy.ndarray, acts: numpy.ndarray, infos: Optional[numpy.ndarray], terminal: bool, rews: numpy.ndarray)¶

Bases: types_unique.Trajectory

A Trajectory that additionally includes reward information.

rews: numpy.ndarray¶: Reward, shape (trajectory_len, ). dtype float.

class types_unique.Transitions(obs: numpy.ndarray, acts: numpy.ndarray, infos: numpy.ndarray, next_obs: numpy.ndarray, dones: numpy.ndarray)¶

Bases: types_unique.TransitionsMinimal

A batch of obs-act-obs-done transitions.

dones: numpy.ndarray¶

Boolean array indicating episode termination. Shape: (batch_size, ).

done[i] is true iff next_obs[i] the last observation of an episode.

next_obs: numpy.ndarray¶

New observation. Shape: (batch_size, ) + observation_shape.

The i’th observation next_obs[i] in this array is the observation after the agent has taken action acts[i].

Invariants:

next_obs.dtype == obs.dtype
len(next_obs) == len(obs)

class types_unique.TransitionsMinimal(obs: numpy.ndarray, acts: numpy.ndarray, infos: numpy.ndarray)¶

Bases: torch.utils.data.dataset.Dataset

A Torch-compatible Dataset of obs-act transitions.

This class and its subclasses are usually instantiated via imitation.data.rollout.flatten_trajectories.

Indexing an instance trans of TransitionsMinimal with an integer i returns the i`th `Dict[str, np.ndarray] sample, whose keys are the field names of each dataclass field and whose values are the ith elements of each field value.

Slicing returns a possibly empty instance of TransitionsMinimal where each field has been sliced.

acts: numpy.ndarray¶: Actions. Shape: (batch_size,) + action_shape.

infos: numpy.ndarray¶: Array of info dicts. Shape: (batch_size,).

obs: numpy.ndarray¶

Previous observations. Shape: (batch_size, ) + observation_shape.

The i’th observation obs[i] in this array is the observation seen by the agent when choosing action acts[i]. obs[i] is not required to be from the timestep preceding obs[i+1].

class types_unique.TransitionsWithRew(obs: numpy.ndarray, acts: numpy.ndarray, infos: numpy.ndarray, next_obs: numpy.ndarray, dones: numpy.ndarray, rews: numpy.ndarray)¶

Bases: types_unique.Transitions

A batch of obs-act-obs-rew-done transitions.

rews: numpy.ndarray¶

Reward. Shape: (batch_size, ). dtype float.

The reward rew[i] at the i’th timestep is received after the agent has taken action acts[i].

types_unique.dataclass_quick_asdict(obj) → Dict[str, Any]¶

Extract dataclass to items using dataclasses.fields + dict comprehension.

This is a quick alternative to dataclasses.asdict, which expensively and undocumentedly deep-copies every numpy array value. See https://stackoverflow.com/a/52229565/1091722.

Args:: obj: A dataclass instance.
Returns:: A dictionary mapping from obj field names to values.

types_unique.load(path: Union[str, bytes, os.PathLike]) → Sequence[types_unique.Trajectory]¶: Loads a sequence of trajectories saved by save() from path.

types_unique.load_with_rewards(path: Union[str, bytes, os.PathLike]) → Sequence[types_unique.TrajectoryWithRew]¶: Loads a sequence of trajectories with rewards from a file.

types_unique.path_to_str(path: Union[str, bytes, os.PathLike]) → str¶

types_unique.save(path: Union[str, bytes, os.PathLike], trajectories: Sequence[types_unique.Trajectory])¶

Save a sequence of Trajectories to disk using a NumPy-based format.

We create an .npz dictionary with the following keys: * obs: flattened observations from all trajectories. Note that the leading dimension of this array will be len(trajectories) longer than the acts and infos arrays, because we always have one more observation than we have actions in any trajectory. * acts: flattened actions from all trajectories * infos: flattened info dicts from all trajectories. Any trajectories with no info dict will have their entry in this array set to the empty dictionary. * terminal: boolean array indicating whether each trajectory is done. * indices: indices indicating where to split the flattened action and infos arrays, in order to recover the original trajectories. Will be a 1D array of length len(trajectories).

Args:

path: Trajectories are saved to this path. trajectories: The trajectories to save.

Raises:

ValueError: If the trajectories are not all of the same type, i.e. some are: Trajectory and others are TrajectoryWithRew.

types_unique.transitions_collate_fn(batch: Sequence[Mapping[str, numpy.ndarray]]) → Mapping[str, Union[numpy.ndarray, torch.Tensor]]¶

Custom torch.utils.data.DataLoader collate_fn for TransitionsMinimal.

Use this as the collate_fn argument to DataLoader if using an instance of TransitionsMinimal as the dataset argument.

Args:: batch: The batch to collate.
Returns:: A collated batch. Uses Torch’s default collate function for everything except the “infos” key. For “infos”, we join all the info dicts into a list of dicts. (The default behavior would recursively collate every info dict into a single dict, which is incorrect.)