* External authors




Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

Varun Kompella

Roberto Capobianco

Stacy Jong*

Jonathan Browne*

Spencer Fox*

Lauren Meyers*

Peter Wurman

Peter Stone

* External authors

AAAI Fall Symposium on AI for Social Good



The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, the even the most data-driven intervention policies rely on heuristics. In this paper, we study how reinforcement learning (RL) can be used to optimize mitigation policies that minimize the economic impact without overwhelming the hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; and (2) an RL-based methodology for optimizing fine-grained mitigation policies within this simulator. Our results validate both the overall simulator behavior and the learned policies under realistic conditions.

Related Publications

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

AAAI, 2024
Zizhao Wang*, Caroline Wang*, Xuesu Xiao*, Yuke Zhu*, Peter Stone

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is …

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

AAAI, 2024
Arrasy Rahman*, Jiaxun Cui*, Peter Stone

Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea…

Learning Optimal Advantage from Preferences and Mistaking it for Reward

AAAI, 2024
W. Bradley Knox*, Stephane Hatgis-Kessell*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*

We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF)---including those used to fine tune ChatGPT and other contemporary language models. Most recent work o…

  • HOME
  • Publications
  • Reinforcement Learning for Optimization of COVID-19 Mitigation Policies


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.