Search

Scholarly Works (10 results)

Sort By:

Article

Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies

Working Papers (2006)

Cover page: Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies

Thesis
Peer Reviewed

High-dimensional and causal inference

UC Berkeley Electronic Theses and Dissertations (2019)

High-dimensional and causal inference are topics at the forefront of statistical research. This thesis is a unified treatment of three contributions to these literatures. The first two contributions are to the theoretical statistical literature; the third puts the techniques of causal inference into practice in policy evaluation.

In Chapter 2, we suggest a broadly applicable remedy for the failure of Efron’s bootstrap in high dimensions is to modify the bootstrap so that data vectors are broken into blocks and the blocks are resampled independently of one another. Cross-validation can be used effectively to choose the optimal block length. We show both theoretically and in numerical studies that this method restores consistency and has superior predictive performance when used in combination with Breiman’s bagging procedure. This chapter is joint work with Peter Hall and Hugh Miller.

In Chapter 3, we investigate regression adjustment for the modified outcome (RAMO). An equivalent procedure is given in Rubin and van der Laan [2007] and then in Luedtke and van der Laan [2016]; philosophically similar ideas appear to originate in Miller [1976]. We establish new guarantees when the procedure is applied in designed experiments (where the propensity score is known a priori) and confirm that the procedure is doubly robust. RAMO can be implemented in only a few lines of code and it can be immediately combined with existing regression models, including random forests and deep neural networks, used in classical prediction problems. This chapter is joint work with Bin Yu and Jasjeet Sekhon.

In Chapter 4, we investigate the specific deterrent effect of traffic citations. In Queensland, Australia many speeding and red-light running offenses are detected by traffic cameras and drivers are notified of the citation, not at the time they commit the offense, but when the citation notice is delivered by mail about two weeks later. We use a regression discontinuity design to assess whether the chance of crashing or recidivism changes at the moment of notification. We analyzed a population of nearly 3 million drivers who committed camera-detected offenses. We conclude that there is not a significant change in the incidence of crashes but there is a marked decrease in recidivism of about 25%. This chapter is joint work with David Studdert and Jeremy Goldhaber-Fiebert.

Cover page: High-dimensional and causal inference

Thesis
Peer Reviewed

Essays on Causal Inference in Randomized Experiments

UC Berkeley Electronic Theses and Dissertations (2013)

This dissertation explores methodological topics in the analysis of randomized experiments, with a focus on weakening the assumptions of conventional models.

Chapter 1 gives an overview of the dissertation, emphasizing connections with other areas of statistics (such as survey sampling) and other fields (such as econometrics and psychometrics).

Chapter 2 reexamines Freedman's critique of ordinary least squares regression adjustment in randomized experiments. Using Neyman's model for randomization inference, Freedman argued that adjustment can lead to worsened asymptotic precision, invalid measures of precision, and small-sample bias. This chapter shows that in sufficiently large samples, those problems are minor or easily fixed. OLS adjustment cannot hurt asymptotic precision when a full set of treatment-covariate interactions is included. Asymptotically valid confidence intervals can be constructed with the Huber-White sandwich standard error estimator. Checks on the asymptotic approximations are illustrated with data from a randomized evaluation of strategies to improve college students' achievement. The strongest reasons to support Freedman's preference for unadjusted estimates are transparency and the dangers of specification search.

Chapter 3 extends the discussion and analysis of the small-sample bias of OLS adjustment. The leading term in the bias of adjustment for multiple covariates is derived and can be estimated empirically, as was done in Chapter 2 for the single-covariate case. Possible implications for choosing a regression specification are discussed.

Chapter 4 explores and modifies an approach suggested by Rosenbaum for analysis of treatment effects when the outcome is censored by death. The chapter is motivated by a randomized trial that studied the effects of an intensive care unit staffing intervention on length of stay in the ICU. The proposed approach estimates effects on the distribution of a composite outcome measure based on ICU mortality and survivors' length of stay, addressing concerns about selection bias by comparing the entire treatment group with the entire control group. Strengths and weaknesses of possible primary significance tests (including the Wilcoxon-Mann-Whitney rank sum test and a heteroskedasticity-robust variant due to Brunner and Munzel) are discussed and illustrated.

Cover page: Essays on Causal Inference in Randomized Experiments

Article
Peer Reviewed

The Design of Field Experiments With Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs

UC Berkeley Previously Published Works (2017)

There is increasing interest in experiments where outcomes are measured by surveys and treatments are delivered by a separate mechanism in the real world, such as by mailers, door-To-door canvasses, phone calls, or online ads. However, common designs for such experiments are often prohibitively expensive, vulnerable to bias, and raise ethical concerns. We show how four methodological practices currently uncommon in such experiments have previously undocumented complementarities that can dramatically relax these constraints when at least two are used in combination: (1)Â online surveys recruited from a defined sampling frame (2)Â with at least one baseline wave prior to treatment (3)Â with multiple items combined into an index to measure outcomes and, (4)Â when possible, a placebo control. We provide a general and extensible framework that allows researchers to determine the most efficient mix of these practices in diverse applications. Two studies then examine how these practices perform empirically. First, we examine the representativeness of online panel respondents recruited from a defined sampling frame and find that their representativeness compares favorably to phone panel respondents. Second, an original experiment successfully implements all four practices in the context of a door-To-door canvassing experiment. We conclude discussing potential extensions.

1 supplemental PDF

Cover page: The Design of Field Experiments With Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs

Thesis
Peer Reviewed

Heterogeneous Treatment Effect Estimation Using Machine Learning

UC Berkeley Electronic Theses and Dissertations (2019)

With the rise of large and fine-grained data sets, there is a desire for researchers, physicians, businesses, and policymakers to estimate the treatment effect heterogeneity across individuals and contexts at an ever-greater precision to effectively allocate resources, to adequately assign treatments, and to understand the underlying causal mechanism. In this thesis, we provide tools for estimating and understanding the treatment heterogeneity.

Chapter 1 introduces a unifying framework for many estimators of the Conditional Average Treatment Effect (CATE), a function that describes the treatment heterogeneity. We introduce meta-learners as algorithms that can be combined with any machine learning/regression method to estimate the CATE. We also propose a new meta-learner, the X-learner, that can adapt to structural properties such as the smoothness and sparsity of the underlying treatment effect. We then present its desirable properties through simulations and theory and apply it to two field experiments.

As part of this thesis, we created an R package, causalToolbox, that implements eight CATE estimators and several tools that are useful to estimate the CATE and understand the underlying causal mechanism. Chapter 2 focuses on the causalToolbox package and explains how the package is structured and implemented. The package uses the same syntax for all implemented CATE estimators. That makes it easy for appliers to switch between estimators and compare different estimators on a given data set. We give examples of how it can be used to find a well-performing estimator for a given data set, how confidence intervals for the CATE can be computed, and how estimating the CATE for a unit with many CATE estimators simultaneously can give practitioners a sense for which estimates are unstable and depend heavily on the chosen estimator.

Chapter 3 is an application of the causalToolbox package. It shows how useful it is in a simulation study that has been set up for the Empirical Investigation of Methods for Heterogeneity Workshop at the 2018 Atlantic Causal Inference Conference by Carlos Carvalho, Jennifer Hill, Jared Murray, and Avi Feller, based on the National Study of Learning Mindsets.

When implementing the CATE estimators, we noticed that there was a need for a variation of the Random Forests (RF) algorithm that works particularly well for statistical inference. We designed an R package, forestry, that implements a new version of the RF algorithm and several tools for statistical inference with it. In Chapter 4, we describe the problem that confidence interval estimation with RF can perform poorly in areas where RF are biased or in areas outside of the support of the training data. We then introduce a new method that allows us to screen for points for which our confidence intervals methods should not be used.

CATE estimates can be used to assign treatments to subjects, but in many studies, estimating the CATE is not the ultimate goal. Researchers often want to understand the underlying causal mechanisms. In Chapter 5, we discuss a modification of the RF algorithm that is particularly interpretable and allows practitioners to understand the underlying mechanism better. Usually, RF are based on deep regression trees that are difficult to understand. In this new version of the RF, we use linear response functions and very shallow trees to make the results more easily understandable. The algorithm finds splits in quasi-linear time and locally adapts to the smoothness of the underlying response functions. In an experimental study, we show that it leads to shallow and interpretable trees that compare favorably to other regression estimators on a broad range of real-world data sets.

Cover page: Heterogeneous Treatment Effect Estimation Using Machine Learning

Thesis
Peer Reviewed

Three Statistical Methods for the Social Sciences

UC Berkeley Electronic Theses and Dissertations (2012)

Social sciences offer particular challenges to statistics due to difficulties such as conducting randomized experiments in this domain, the large variation in humans, the difficulty in collecting complete datasets, and the typically unstructured nature of data at the human scale. New technology allows for increased computation and data recording, which has in turn brought forth new innovations for analysis.

Because of these challenges and innovations, statistics in the social sciences is currently thriving and vibrant.

This dissertation is an argument for evaluating statistical methodology in the social sciences along four major axes: \emph{validity}, \emph{interpretability}, \emph{transparency}, and \emph{employability}. We illustrate how one might develop methods that achieve these four goals with three case studies.

The first is an analysis of post-stratification, a form of covariate adjustment to evaluate treatment effect. In contrast to recent results showing that regression adjustment can be problematic under the Neyman-Rubin model, we show post-stratification, something that can easily done in, e.g., natural experiments, has a similar precision to a randomized block trail as long as there are not too many strata. The difference is $O(1/n^2)$. Post-stratification thus potentially allows for transparently exploiting predictive covariates and random mechanisms in observational data. This case study illustrates the value of analyzing a simple estimator under weak assumptions, and of finding similarities between different methodological approaches so as to leverage earlier findings to a new domain.

We then present a framework for building statistical tools to extract topic-specific key-phrase summaries of large text corpora (e.g., the New York Times) and a human validation experiment to determine best practices for this approach. These tools, built from high-dimensional, sparse classifiers such as L1-logistic regression and the Lasso, can be used to, for example, translate essential concepts across languages, investigate massive databases of aviation reports, or understand how different topics of interest are covered by various media outlets. This case study demonstrates how more modern methods can be evaluated using external validation in order to demonstrate that they produce meaningful and comprehendible results that can be broadly used.

The third chapter presents the trinomial bound, a new auditing technique for elections rooted in very minimal assumptions. We demonstrated the usability of this technique by, in November 2008, auditing contests in Santa Cruz and Marin counties, California.

The audits were risk-limiting, meaning they had a pre-specified minimum chance of requiring a full hand count if the outcomes were wrong. The trinomial bound gave better results than the Stringer bound, a tool common in accounting for analyzing financial audit samples drawn with probability proportional to an error bound. This case study focuses on generating methods that are employable and transparent so as to serve a public need.

Throughout, we argue that, especially in the difficult domain of the social sciences, we must spend extra attention on the first axis of validity. This motivates our using the Neyman-Rubin model for the analysis of post-stratification, our developing an approach for external, model-independent validation for the key-phrase extraction tools, and our minimal assumptions for election auditing.

Cover page: Three Statistical Methods for the Social Sciences

Article
Peer Reviewed

The Design of Field Experiments With Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs

UC Berkeley Previously Published Works (2017)

1 supplemental PDF

Article
Peer Reviewed

Metalearners for estimating heterogeneous treatment effects using machine learning

UC Berkeley Previously Published Works (2019)

There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms-such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks-to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.

Cover page: Metalearners for estimating heterogeneous treatment effects using machine learning

Article
Peer Reviewed

Lasso adjustments of treatment effect estimates in randomized experiments

UC Berkeley Previously Published Works (2016)

We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the least absolute shrinkage and selection operator (Lasso) may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and ordinary least squares (OLS) for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS.

Article
Peer Reviewed

Broad cross-national public support for accelerated COVID-19 vaccine trial designs

UC Berkeley Previously Published Works (2021)

A vaccine for COVID-19 is urgently needed. Several vaccine trial designs may significantly accelerate vaccine testing and approval, but also increase risks to human subjects. Concerns about whether the public would see such designs as ethical represent an important roadblock to their implementation; accordingly, both the World Health Organization and numerous scholars have called for consulting the public regarding them. We answered these calls by conducting a cross-national survey (n = 5920) in Australia, Canada, Hong Kong, New Zealand, South Africa, Singapore, the United Kingdom, and the United States. The survey explained key differences between traditional vaccine trials and two accelerated designs: a challenge trial or a trial integrating a Phase II safety and immunogenicity trial into a larger Phase III efficacy trial. Respondents' answers to comprehension questions indicate that they largely understood the key differences and ethical trade-offs between the designs from our descriptions. We asked respondents whether they would prefer scientists to conduct traditional trials or one of these two accelerated designs. We found broad majorities prefer for scientists to conduct challenge trials (75%) and integrated trials (63%) over standard trials. Even as respondents acknowledged the risks, they perceived both accelerated trials as similarly ethical to standard trial designs. This high support is consistent across every geography and demographic subgroup we examined, including vulnerable populations. These findings may help assuage some of the concerns surrounding accelerated designs.