# Papers

Per group you pick a certain reinforcement learning research paper with associated codebase. This research paper forms the core of the course: you will try to understand it, replicate its experiments, test it in a new application and/or extend/improve the paper with a new idea.

Note that the below papers are merrily suggestions: you are always free to come up with your own paper of interest.

Do make sure that your paper comes with a trustworthy public codebase (ideally from the original authors of the paper) and a smaller toy experiment that is for sure computationally feasible to get some results. Always discuss your choice with the teachers.

Vanilla model-free RL algorithms still form the core of applied reinforcement learning research. These algorithms are relatively stable and well-studied, and your first choice as a benchmark in an applied problem. Consider one of the below algorithms if you want to study one of the Applications.

## RL & Diffusion

Diffusion models (covered here) are a very successful approach to generative modelling. However, the can also be applied to reinforcement learning, as surveyed here. Especially interesting is the Diffuser approach, where we bias the generative sampling process based on a reward function, and as such generate entire policies without ever explicitly performing RL credit assignment.

## Generative Flow Networks

Generative Flow (GFlow) Networks (introduced here) are a relatively novel approach in machine learning. It is used to sample from complex distributions through a sequential generation process, and as such forms a bridge between deep generative models and deep reinforcement learning. The below papers explicitly combine methods from both GFlow and RL theory.

## Multi-agent RL

Multi-agent reinforcement learning, surveyed here, is the paradigm for problems that contain multiple acting agents. Many problems can be solved in a centralized way (a single decision-making entity that controls all agents), but this might i) not be feasible in practice due to latency and ii) heavily slow down training (due to the large single state space). A common approach in multi-agent RL is to train in a centralized fashion (where information is shared between agents to speed up training), but deploy/execute in a decentralized fashion (without information sharing).