Machine Learning and High Dimensional Methods for Causal and Policy Learning
Paper Session
Friday, Jan. 6, 2023 8:00 AM - 10:00 AM (CST)
- Chair: Ying-Ying Lee, University of California-Irvine
Automated Inference on Sharp Bounds
Abstract
Many causal parameters involving the joint distribution of potential outcomes in treated and control states cannot be point-identified, but only be bounded from above and below. The bounds can be further tightened by conditioning on pre-treatment covariates, and the sharp version of the bounds corresponds to using a full covariate vector. This paper gives a method for estimation and inference on sharp bounds determined by a linear system of under-identified equalities (e.g., as in Heckman et al (ReSTUD, 1997)). In the sharp bounds’ case, the RHS of this system involves a nuisance function of (many) covariates (e.g., the conditional probability of employment in treated or control state). Combining Neyman-orthogonality and sample splitting, I provide an asymptotically Gaussian estimator of sharp bound that does not require solving the linear system in closed form. I demonstrate the method in an empirical application to Connecticut’s Jobs First welfare reform experiment.Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy
Abstract
The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census in an effort to maintain privacy, suggesting a painful trade-off between the privacy of respondents and the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments, with a rate of n^{-1/2} for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and validate in the Census. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning-adjusted confidence intervals and demonstrate the relevance of our results for 2020 Census data.Testing Monotonicity of Mean Potential Outcomes in a Continuous Treatment with High-Dimensional Data
Abstract
While most treatment evaluations focus on binary interventions, a growing literature also considers continuously distributed treatments. We propose a Cramer-von Mises type test for testing whether the mean potential outcome given a specific treatment has a weakly monotonic relationship with the treatment dose under a weak unconfoundedness assumption. In a nonseparable structural model, applying our method amounts to testing monotonicity of the average structural function in the continuous treatment of interest. To flexibly control for a possibly high-dimensional set of covariates in our testing approach, we propose a double debiased machine learning estimator that accounts for covariates in a data-driven way. We show that the proposed test controls asymptotic size and is consistent against any fixed alternative. These theoretical findings are supported by the Monte-Carlo simulations. As an empirical illustration, we apply our test to the Job Corps study and reject a weakly negative relationship between the treatment (hours in academic and vocational training) and labor market performance among relatively low treatment values.JEL Classifications
- C1 - Econometric and Statistical Methods and Methodology: General