MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling

Neural Information Processing Systems (NeurIPS) 2020

Correspondence to Elise van der Pol: e.e.vanderpol[at]uva[dot]nl · @ElisevanderPol

Relevant links: Paper · Supplementary · Training code · Symmetrizer code

Many RL problems exhibit symmetries: in CartPole, we can mirror the state to find an equivalent state. If we also mirror the actions ‘left’ and ‘right’, we can re-use experience from one side to learn about the other.

Example of a symmetry in Cartpole

Using MDP homomorphisms, we can formalize such symmetries.

Example of an MDP homomorphism in Cartpole

We build networks that are MDP homomorphic under group transformations, in order to more efficiently make use of data collected during training. Since MDP homomorphisms are problem-dependent, we additionally propose a computational way of constructing equivariant network layers. We present a symmetrizer operator that projects a weight matrix to a group-specific equivariant subspace. As a result, we can computationally construct equivariant fully connected and convolutional layers painfree! Finally, we confirm empirically that using MDP homomorphic networks leads to faster convergence for tasks that exhibit group symmetries, such as Pong.

Results of MDP homomorphic networks on Pong

For more results and details, have a look at the paper.