Papers
Per group you pick a certain reinforcement learning research paper with associated codebase. This research paper forms the core of the course: you will try to understand it, replicate its experiments, test it in a new application and/or extend/improve the paper with a new idea.
Note that the below papers are merrily suggestions: you are always free to come up with your own paper of interest.
Do make sure that your paper comes with a trustworthy public codebase (ideally from the original authors of the paper) and a smaller toy experiment that is for sure computationally feasible to get some results. Always discuss your choice with the teachers.
Vanilla model-free RL algorithms still form the core of applied reinforcement learning research. These algorithms are relatively stable and well-studied, and your first choice as a benchmark in an applied problem. Consider one of the below algorithms if you want to study one of the Applications.
any other RL algorithm with a CleanRL implementation (if you are familiar with Jax you could consider PureJaxRL, although CleanRL also has Jax implementations.
Diffusion models (covered here) are a very successful approach to generative modelling. However, the can also be applied to reinforcement learning, as surveyed here.
Diffuser: Paper. Website. Code.
An interesting approach could be the Diffuser, where we bias the generative sampling process based on a reward function, and as such generate entire policies without ever explicitly performing RL credit assignment. You could at first focus on reproducing the 'Maze2D' experiments.
Generative Flow Networks
Generative Flow (GFlow) Networks (introduced here) are a relatively novel approach in machine learning. It is used to sample from complex distributions through a sequential generation process, and as such forms a bridge between deep generative models and deep reinforcement learning.
QGFN: Controllable Greediness with Action Values. Paper. Code.
This paper mixes the Flow objective with the greedy RL objective. You might reproduce the 'molecule generation task'.
Generative Flow Networks as Entropy-Regularized RL. Paper. Code.
This paper more fundamentally interpolate both, casting them as entropy-regularized RL. You could look at reproducing the 'Hypergrid' experiments.
Multi-Agent RL
Certain applications intrinsically have multiple agents, for which you can use multi-agent reinforcement learning.