You can choose to focus your project on an RL research paper. Your goal is to thoroughly understand the paper, replicate its experiments, and optionally extend/improve it with your own new idea.
(You can come up with your own paper of interest, but do discuss this with the teachers. Make sure the paper comes with 1) a public codebase and 2) a smaller toy experiment that is for sure computationally feasible to reproduce.)
To deploy reinforcement learning algorithms in the real-world we typically need to ensure certain safety constraints. But how can achieve this?
Traditionally RL trains on one particular reward/goal specification. However, we ideally want solutions that generalize over tasks, i.e., which can give good zero-shot/few-shot solutions on new task/reward specifications.
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings. Paper. Code.
URLB: Unsupervised Reinforcement Learning Benchmark. Paper. Code.
Instead of (slowly) training the weights of a network to learn a task, can we quickly adapt by conditioning on new experience in the context-window of neural network (typically an LLM).
LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations. Paper. Code.
In model-based RL we integrate/iterate planning and learning. A learned policy and/or value network can be used to guide a planning procedure, while the output of the plan can be used to select actions and/or as training target for the learned networks.