The research presented in this thesis focuses on the analysis of data arising from matched case-control designs, with particular emphasis on case-crossover designs. We begin by providing a scientific example that motivates the research presented, highlighting statistical issues raised in addressing the scientific goal of study. We provide background and notation that lays the foundation for the remainder of the dissertation. The occurrence of repeated events per patient or cluster and an imbalance in cluster sizes poses statistical challenges in the analysis of case-crossover studies (or more generally in matched case-control studies). We begin with a background of existing methods, then focus on methods to estimate association parameters in matched cases control designs while accounting for within-subject correlation in the data. The methods discussed assume the willingness to break the individual matched case-control bonds within matched sets, thereby accounting for within-subject correlation directly in the estimation procedure. It is illustrated that existing estimation procedures can result in severe bias depending upon the number of repeated events per patient/cluster and the magnitude of covariate effect on the response.
Then, methods are discussed where it is no longer acceptable to break the matched case-control bonds. These methods employ substantially different weighting methods to obtain parameter estimates, and the resulting estimand consistently estimated by each procedure is investigated. We focus on the scenario of varying matched set sizes (varying cluster sizes), where effect modification exists across clusters. It is shown that currently implemented frequentist methods for analyzing case-crossover data with unbalanced cluster sizes force one to choose between weighting schemes that estimate marginal or conditionally-weighted covariate effects.
In order to directly model and contrast marginal and subject-specific estimates of association in matched case-control studies, a novel method for obtaining estimates is developed. The proposed methodology allows for simultaneous estimation of both marginal and subject-specific covariate effects by implementing a semi-parametric Bayesian hierarchical framework.
Throughout, the utility of the resulting methodology is illustrated using data obtained from a case-crossover study of children sampled from Orange County, CA seeking to quantify the effect of air pollution exposure on the risk of asthma-related hospital encounters.