« Back to Results

Personalizing Treatments using Machine Learning

Paper Session

Sunday, Jan. 3, 2021 10:00 AM - 12:00 PM (EST)

Hosted By: American Economic Association
  • Chair: Gregory Lewis, Microsoft Research

Targeting for Long-term outcomes

Jeremy Yang
,
Massachusetts Institute of Technology
Dean Eckles
,
Massachusetts Institute of Technology
Paramveer Dhillon
,
University of Michigan
Sinan Aral
,
Massachusetts Institute of Technology

Abstract

Decision-makers often want to target interventions (e.g., marketing campaigns) so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long- term outcome. Here we build on the statistical surrogacy and off-policy learning literature to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We apply our approach in large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers to maximize their long-term revenue. We first show that conditions for validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We then validate this approach empirically by comparing it with a policy learned on the ground truth long-term outcomes and show that they are statisti- cally indistinguishable. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for each new cohort of customers to account for potential non-stationarity. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to The Boston Globe’s current policies.

Dynamically Optimal Treatment Allocation Using Reinforcement Learning

Karun Adusumilli
,
University of Pennsylvania
Friedrich Geiecke
,
London School of Economics
Claudio Schilter
,
University of Zurich

Abstract

Devising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal tradeoffs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a n−1/2 rate in most examples; this is the same rate as in the static case.

Personalizing Treatments For Habit Formation: Learning Optimal Treatment Rules From a Multi-Arm Experiment

Rahul Ladhania
,
University of Pennsylvania; University of Michigan (Effective July 2020)
Jann Spiess
,
Stanford University
Katherine Milkman
,
University of Pennsylvania
Sendhil Mullainathan
,
University of Chicago
Lyle Ungar
,
University of Pennsylvania

Abstract

Across social science and health policy settings, there has been a strong interest in heterogeneity in treatment effects - identifying subgroups of a population for whom which particular intervention is most effective among a candidate set of interventions. In this paper, we learn the optimal treatment assignment rule in an experimental setting with a large number of discrete treatment arms. We propose an 'honest' recursive partitioning tree and forest-based approach in a multiple discrete treatment arm setting to learn and validate the individualized assignment rules. We apply this method using data from a mega randomized control trial conducted in collaboration with a national gym chain, with over 50 behavioral interventions promoting the formation of lasting exercise habits. We compare our approach to regression-based Q-learning and classification-based sequential outcome weighted learning approaches.
Discussant(s)
Jeffrey McCullough
,
University of Michigan
Jann Spiess
,
Stanford University
Mert Demirer
,
Massachusetts Institute of Technology
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C5 - Econometric Modeling