# Papers

Per group you pick a certain reinforcement learning research paper with associated codebase. This research paper forms the core of the course: you will try to understand it, replicate its experiments, test it in a new application and/or extend/improve the paper with a new idea.

Note that the below papers are merrily suggestions: you are always free to come up with your own paper of interest.

Do make sure that your paper comes with a trustworthy public codebase (ideally from the original authors of the paper) and a smaller toy experiment that is for sure computationally feasible to get some results. Always discuss your choice with the teachers.

Vanilla model-free RL algorithms still form the core of applied reinforcement learning research. These algorithms are relatively stable and well-studied, and your first choice as a benchmark in an applied problem. Consider one of the below algorithms if you want to study one of the Applications.

any other RL algorithm with a CleanRL implementation (if you are familiar with Jax you could consider PureJaxRL, although CleanRL also has Jax implementations.

Diffusion models (covered here) are a very successful approach to generative modelling. However, the can also be applied to reinforcement learning, as surveyed here.

Diffuser: Paper. Website. Code.

An interesting approach could be the Diffuser, where we bias the generative sampling process based on a reward function, and as such generate entire policies without ever explicitly performing RL credit assignment. You could at first focus on reproducing the 'Maze2D' experiments.

## Generative Flow Networks

Generative Flow (GFlow) Networks (introduced here) are a relatively novel approach in machine learning. It is used to sample from complex distributions through a sequential generation process, and as such forms a bridge between deep generative models and deep reinforcement learning.

QGFN: Controllable Greediness with Action Values. Paper. Code.

This paper mixes the Flow objective with the greedy RL objective. You might reproduce the 'molecule generation task'.

Generative Flow Networks as Entropy-Regularized RL. Paper. Code.

This paper more fundamentally interpolate both, casting them as entropy-regularized RL. You could look at reproducing the 'Hypergrid' experiments.

## Multi-Agent RL

Certain applications intrinsically have multiple agents, for which you can use multi-agent reinforcement learning.