Applications
A nice direction for the course is to apply reinforcement learning to a new application.
This year we will primarily focus on reinforcement learning for sustainable energy.
If you focus on an application, we advise to pick a well-known RL algorithm from CleanRL and use that implementation.
Note: always make sure your algorithm matches the application (i.e., you cannot do multi-agent RL in a single-agent RL problem).
SustainGym: Paper. Website. Github.
SustainGym is a suite of reinforcement learning environments focus on the transition towards sustainable energy. You could pick one of the environments and try to apply reinforcement learning to it. We have several PhD students working on RL for sustainability as well, which may provide you with additional support. Some options:
Electrical Vehicle Charging: You need to charge a stream of incoming cars with variable (and uncertain) arrival moments. There are constraints on the total availalbe amount of electricity at any point, and your agent needs to select the charging rates per car.
Electricity Market Bidding: The environment contains a range of energy generators (e.g., a wind farm) and batteries that can all deliver energy into the market. Your agent needs to bid at what price it can deliver energy, after which the market operator selects the required energy at the best available price (i.e., if you bid too high than you might be skipped and earn nothing).
Datacenter Job Scheduling: A large part of the jobs in a datacenter have low priority. We would therefore like to schedule them to moments when there is much green energy available, to reduce carbon emissions. Your agent needs to decide what capacity of the datacenter is available for compute at every timestep.
Building Management: You need to manage the temperature in the different zones of a building. Depending on the weather, amount of sunlight on a certain zone and amount of people in a certain zone, your agents needs to determine how much heating to supply to every zone.
Electron Microscopy Abberation Correction - with ThermoFisher Scientific: You need to control the parameters of an electron microscope to correct for abberations (disturbances) in the obtained image. The correct adjustments cannot be inferred from a single observation, and we therefore face a partially observable MDP.
Example paper -- supervised learning: this paper studies the above problem from a supervised learning point of view. However, it is hard to obtain the correct supervised action labels at every step, and reinforcement learning (where we only have to reward the correct final image) is a promising alternative.