In quantifying heterogeneity in time-to-event data, potential biomarker information and demographic status reflecting key pathophysiology at the individual level are increasingly available. Typical analysis of time-to-event data relies on observed event data. To ensure adequate statistical power, longer follow-up periods or additional participants are required, adding enormously to study costs. Incorporating biomarker information can reduce the cost of data collection and maximize statistical power.
There are, however, several drawbacks to the direct incorporation of biomarkers for identification and analysis. Specifically, most models assume time-invariant effects, independent censoring, and complete information. Little work has been done to assess the robustness of the currently used models beyond these assumptions.
The Cox proportional hazards model is the most commonly used model for time-to-event data. This model compares observed covariates to the weighted average of covariates in the risk set. The Cox model consistently estimates a weighted time-averaged effect under proportional hazards. However, in the field of Alzheimer's disease, the proportional hazard model often fails as the covariate effect diminishes due to disease progression. Directly applying the model ignores the potential changes in the underlying effect. In addition, the assumption of independent censoring may fail to acknowledge possible associations between biomarkers and the missing data mechanism. As a result, the observed event in later follow-up times may be underrepresented relative to the overall population due to censoring. Currently, there is no existing research to address the above issues simultaneously while being robust to possible violations of assumptions and adaptive to various types of biomarker information.
Given the increasing availability of repeated biomarker measurements, there is a growing interest in jointly modeling longitudinal and time-to-event data to maximize statistical information on the association between potential biomarkers and disease progression. Within longitudinal data, it is often observed that subgroups exist among populations, where the underlying development can be clustered into different patterns. These clustering patterns offer valuable insights into the natural history of diseases, including their progression, risk factors, and potential causes. In practice, longitudinal data frequently encounter missing values. While typical clustering methods handles intermittent missing values through imputation methods, in disease studies, monotone missingness often occurs in longitudinal data when measurements are lost after a specific time due to terminal events and censoring. Existing imputation methods may introduce bias when attempting to recover trajectories of similar lengths. However, there has been little examination of the currently used methods on longitudinal clustering with monotone missingness.
This dissertation aims to develop a flexible and robust statistical model to evaluate predictors of time-to-event data. The resulting estimator will be consistent and robust under violated assumptions and repeated measures. In Chapter 3, we proposed a reweighted censoring robust estimator using censoring weights and conditional covariate variance. In Chapter 4, we introduced an robust variance estimator based on the influence function for the estimator proposed in Chapter 3. In Chapter 5, we investigated the performance of the current longitudinal clustering method under the monotone missing censoring mechanism, and proposed a shape-based longitudinal partial mapping clustering method to complement the estimator proposed in Chapter 3.