Bayesian Inverse RL

Bayesian IRL targets in learning a function that maximizes the posterior distribution of the reward. The advantage of Bayesian method is the ability to convey the prior information about the reward through a prior distribution. In the current work, we can consider of leveraging this since, we can use the existing formula of topological resilience metric. Currently, we have to think which parameter of this prior distribution will change based on the topological variation. Another advantage of using Bayesian IRL is the ability to account for complex behavior by modeling reward probabilistic-ally as a mixture of multiple resilient functions.

Classes and Functions

Created on Thu Jun 16 15:08:05 2022

@author: abhijeetsahu

This is the tested Bayesian IRL code adopted from : https://github.com/amsterg/birl

Tested for a Frozen LAke environment

class birl.Birl(num_states)

Bases: object

mcmc_reward_step(rewards, step_size, r_max)
optimal_q_check(q_values, pi)
policy_walk()
posterior(agent_with_env, prior)
posteriors_ratio(dp, dp_new, prior=1)
sample_random_rewards(n_states, step_size, r_max)

sample random rewards form gridpoint(R^{n_states}/step_size). :param n_states: :param step_size: :param r_max: :return: sampled rewards

sim(agent_with_env)
class birl.DP(env, gamma=0.8)

Bases: object

policy_eval()
policy_imp()
policy_iter()