Adaptive Experimentation and Policy Learning
Paper Session
Friday, Jan. 6, 2023 8:00 AM - 10:00 AM (CST)
- Chair: Toru Kitagawa, Brown University
Treatment Choice with Nonlinear Regret
Abstract
The literature on treatment choice focuses on the mean of welfare regret. Ignoring other features of the regret distribution, however, can lead to an undesirable rule due to sampling uncertainty. Instead, we propose to minimize the mean of a nonlinear transformation of welfare regret. This paradigm shift alters optimal rules drastically. We show that for a wide class of nonlinear criteria, admissible rules are fractional. Focusing on mean square regret, we derive the closed-form probabilities of randomization for finite-sample Bayes and minimax optimal rules when data are normal with known variance. The minimax rule is a simple logit based on the sample mean and agrees with the posterior probability for positive treatment effect under the least favorable prior. The Bayes rule with an uninformative prior is different but produces quantitatively comparable mean square regret. We extend these results to limit experiments and discuss our findings through sample size calculations.Policy Design in Experiments with Unknown Interference
Abstract
In this talk, I will discuss the problem of experimental design for estimation and inference on welfare-maximizing policies in the presence of spillover effects. As a first contribution, I introduce a single-wave experiment that estimates the marginal effect of a change in treatment probabilities, taking spillover effects into account. Using the marginal effect, I propose a practical test for policy optimality. The idea is that researchers should report the marginal effect and test for policy optimality: the marginal effect indicates the direction for a welfare improvement, and the test provides evidence on whether it is worth conducting additional experiments to estimate a welfare-improving treatment allocation. As a second contribution, I design a multiple-wave experiment to estimate treatment assignment rules and maximize welfare, and derive guarantees on the proposed procedure. I illustrate the benefits of the method in simulations calibrated to existing experiments on information diffusion and cash-transfer programs.Adaptivity and Confounding in Multi-Armed Bandit Experiments
Abstract
Multi-armed bandit algorithms minimize experimentation costs required to converge on optimal behavior. They do so by rapidly adapting experimentation effort away from poorly performing actions as feedback is observed. But this desirable feature makes them sensitive to confounding, which is the primary concern underlying classical randomized controlled trials. We highlight, for instance, that popular bandit algorithms cannot address the problem of identifying the best action when day-of-week effects may confound inferences. In response, this paper proposes deconfounded Thompson sampling, which makes simple, but critical, modifications to the way Thompson sampling is usually applied. Theoretical guarantees suggest the algorithm strikes a delicate balance between adaptivity and robustness to confounding. It attains asymptotic lower bounds on the number of samples required to confidently identify the best action — suggesting optimal adaptivity — but also satisfies strong performance guarantees in the presence of day-of-week effects and delayed observations — suggesting unusual robustness. At the core of the paper is a new model of contextual bandit experiments in which issues of delayed learning and distribution shift arise organically.Discussant(s)
Toru Kitagawa
,
Brown University
Kaito Ariu
,
CyberAgent, Inc.
Davide Viviano
,
University of California-San Diego
Chao Qin
,
Columbia University
JEL Classifications
- C4 - Econometric and Statistical Methods: Special Topics
- C9 - Design of Experiments