Solving Markov games

This project explores planning algorithms that find approximate Markov perfect equilibia in Markov games. As a mock environment for testing, I created a two-player grid environment with a runner and a chaser. Each player can move up, down, left, or right. To ensure the chaser can catch up to the runner, the runner will slip and fail to move with probability 10.0%.

The runner will obtain a reward based on where it moves. The reward increases as the runner moves toward the bottom-right of the grid. The chaser is rewarded when it occupies the same position as the runner.

Policy gradient

In the case where the state space is continuous, planning algorithms are now infeasible. Here, I choose to use the policy gradient algorithm called Proximal Policy Optimization (PPO).