« Back to Results

Machine Learning and High Dimensional Methods for Causal and Policy Learning

Paper Session

Friday, Jan. 6, 2023 8:00 AM - 10:00 AM (CST)

Hilton Riverside, Durham
Hosted By: Econometric Society
  • Chair: Ying-Ying Lee, University of California-Irvine

Risk and Optimal Policies in Bandit Experiments

Karun Adusumilli
,
University of Pennsylvania

Abstract

We provide a decision theoretic analysis of bandit experiments. The setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for bandit experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit experiment with a PDE which can be efficiently solved using sparse matrix routines. We derive the optimal Bayes and minimax policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such as Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.

Automated Inference on Sharp Bounds

Vira Semenova
,
University of California-Berkeley

Abstract

Many causal parameters involving the joint distribution of potential outcomes in treated and control states cannot be point-identified, but only be bounded from above and below. The bounds can be further tightened by conditioning on pre-treatment covariates, and the sharp version of the bounds corresponds to using a full covariate vector. This paper gives a method for estimation and inference on sharp bounds determined by a linear system of under-identified equalities (e.g., as in Heckman et al (ReSTUD, 1997)).  In the sharp bounds’ case, the RHS of this system involves a nuisance function of (many) covariates (e.g., the conditional probability of employment in treated or control state). Combining Neyman-orthogonality and sample splitting, I provide an asymptotically Gaussian estimator of sharp bound that does not require solving the linear system in closed form. I demonstrate the method in an empirical application to Connecticut’s Jobs First welfare reform experiment.

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

Anish Agarwal
,
Massachusetts Institute of Technology
Rahul Singh
,
Massachusetts Institute of Technology

Abstract

The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census in an effort to maintain privacy, suggesting a painful trade-off between the privacy of respondents and the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments, with a rate of n^{-1/2} for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and validate in the Census. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning-adjusted confidence intervals and demonstrate the relevance of our results for 2020 Census data.

Testing Monotonicity of Mean Potential Outcomes in a Continuous Treatment with High-Dimensional Data

Ying-Ying Lee
,
University of California-Irvine
Yu-Chin Hsu
,
Academia Sinica
Martin Huber
,
University of Fribourg
Chu-An Liu
,
Academia Sinica

Abstract

While most treatment evaluations focus on binary interventions, a growing literature also considers continuously distributed treatments. We propose a Cramer-von Mises type test for testing whether the mean potential outcome given a specific treatment has a weakly monotonic relationship with the treatment dose under a weak unconfoundedness assumption. In a nonseparable structural model, applying our method amounts to testing monotonicity of the average structural function in the continuous treatment of interest. To flexibly control for a possibly high-dimensional set of covariates in our testing approach, we propose a double debiased machine learning estimator that accounts for covariates in a data-driven way. We show that the proposed test controls asymptotic size and is consistent against any fixed alternative. These theoretical findings are supported by the Monte-Carlo simulations. As an empirical illustration, we apply our test to the Job Corps study and reject a weakly negative relationship between the treatment (hours in academic and vocational training) and labor market performance among relatively low treatment values.
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General