Project IV (MATH4072) 2024-25 (MT/AW)

Project IV (MATH4072) 2024–25

Reinforcement Learning

Matthias Troffaes and Andrew Wade

Description

One way or another, the reason that we (students and lecturers) are here is learning, and goals, techniques, modalities and assessment of learning are addressed by subjects across the academic spectrum, from education through psychology to biosciences and mathematics. Some of the most striking insights into learning have come rather recently, as humans have thought about ways in which computers, or abstract state machines, can "learn". In many contexts, from Pavlov through to artificial intelligence, learning proceeds by trial-and-reinforcement. The learner tries a strategy, which describes how they respond to various input data, and gauges the reward against their satisfaction at the (noisy) feedback from the system; then the strategy is adjusted to try to improve the performance. While the topic has been reinvigorated by advances in computation, machine learning, and artificial intelligence, the basic idea of optimizing sequential decision-making is one of the pillars of classical statistics.

In this project, we will look at some particular mathematical aspects of reinforcement learning as a way to optimize strategies in situations where uncertainty plays a role, such as when there is a lack of information, randomness, or an inscrutable adversary. A flexible framework for addressing these problems is through Markov decision problems, and this would be a good starting point for investigation. Those of you who have taken the Operations Research course will have seen some of the structure, and the mathematics that is involved (elementary probability, Bellman's optimality equations, policy improvement, etc.). In more advanced settings, these ideas reach the forefront of mathematical research, as well as achieving great utility for applications.

Possible directions include looking at algorithms for optimizing game play in games that involve one or more players. This could range from simple games, such as tic-tac-toe ("noughts-and-crosses") or Tower of Hanoi (Dr Who's "trilogic game"), to more complex situations which can nevertheless be usefully formulated in terms of games. Topics could be games of the student's choosing, issues such as optimizing gameplay against specific opponents, dealing with large state-spaces, or stress-testing games for game design.

The scope of applications is vast, from traditional "operations research" like scheduling, maintenance, inventory management, and so on, to smart devices at large scales (motorways) and small (medical implants), engines in computer games, or development of computers that can beat humans at chess (easy), go (harder), or cricket (maybe next year).

There is considerable scope for simulations in this project.

Prior modules

This project requires students to have taken one or more of the following essential prior modules:

Markov Chains II (MATH2707)
Operations Research III (MATH3141)

For simulations, enthusiasm for R, python, or another suitable language will be necessary.

Resources

For some background on what may be involved, you could:

consult resources on the web by searching for some of the key words in the above description;
look at some of the recommended literature (or other literature you find) to see which look most interesting and/or helpful;
if you've done ORIII, think about how you might encode a basic policy improvement algorithm to train a computer to play tic-tac-toe!

Reading list. Two excellent sources as starting points are the following books.

Sutton, R. S., Barto, A. G. (2018) Reinforcement Learning: An Introduction, 2nd ed., Cambridge, MIT Press. Link.
Kulkarni, P. (2012) Reinforcement and Systemic Machine Learning for Decision Making, Hoboken, Wiley & Sons. Link

Supervision

Prof Troffaes (Term 1) and Prof Wade (Term 2).

Get in touch if you have any questions! Email us