This dissertation explores methodological topics in the analysis of randomized experiments, with a focus on weakening the assumptions of conventional models.
Chapter 1 gives an overview of the dissertation, emphasizing connections with other areas of statistics (such as survey sampling) and other fields (such as econometrics and psychometrics).
Chapter 2 reexamines Freedman's critique of ordinary least squares regression adjustment in randomized experiments. Using Neyman's model for randomization inference, Freedman argued that adjustment can lead to worsened asymptotic precision, invalid measures of precision, and small-sample bias. This chapter shows that in sufficiently large samples, those problems are minor or easily fixed. OLS adjustment cannot hurt asymptotic precision when a full set of treatment-covariate interactions is included. Asymptotically valid confidence intervals can be constructed with the Huber-White sandwich standard error estimator. Checks on the asymptotic approximations are illustrated with data from a randomized evaluation of strategies to improve college students' achievement. The strongest reasons to support Freedman's preference for unadjusted estimates are transparency and the dangers of specification search.
Chapter 3 extends the discussion and analysis of the small-sample bias of OLS adjustment. The leading term in the bias of adjustment for multiple covariates is derived and can be estimated empirically, as was done in Chapter 2 for the single-covariate case. Possible implications for choosing a regression specification are discussed.
Chapter 4 explores and modifies an approach suggested by Rosenbaum for analysis of treatment effects when the outcome is censored by death. The chapter is motivated by a randomized trial that studied the effects of an intensive care unit staffing intervention on length of stay in the ICU. The proposed approach estimates effects on the distribution of a composite outcome measure based on ICU mortality and survivors' length of stay, addressing concerns about selection bias by comparing the entire treatment group with the entire control group. Strengths and weaknesses of possible primary significance tests (including the Wilcoxon-Mann-Whitney rank sum test and a heteroskedasticity-robust variant due to Brunner and Munzel) are discussed and illustrated.