We analyze data from the Los Angeles LGBT Center, a community-based healthcare organization. When patients visit the clinic, they are given a comprehensive risk-assessment questionnaire. We develop three methods that allow us to identify the risk factors associated with HIV seroconversion and predict who is most likely to become HIV positive.
First, we construct a two-stage multivariate logistic regression model, where stage one models a patient's history of illicit drug use and their history of STIs other than HIV, and stage two models their risk of contracting HIV. Each stage of the model has ZIP code random effects that are correlated over space. We propose a statistic called the geometric mean ratio (GMR), which measures how much of the variability in the ZIP code random effects for HIV is explained by the stage one random effects. We find that the stage one random effects are negligible in the HIV model and that where a person lives is not predictive of their risk of contracting HIV.
Next, we jointly model a patient's time until HIV seroconversion with their clinic visit frequency through shared frailties. We show that if clinic visit frequency is correlated with survival, then the censoring is informative. We examine how the informativeness of the censoring depends on the frailty distributions. We find that patients who visit the clinic more frequently tend to have a higher probability of contracting HIV, suggesting that patients are accurately assessing that they have a higher risk of disease.
Finally, we reduce the items from the risk assessment questionnaire into a set of latent measures of patient riskiness with a factor analysis model. Because patients come to the clinic multiple times, we allow the factors to be correlated within a patient over time, and between patients over space. We then use the factor scores from one visit to predict whether or not a patient will seroconvert by their next visit. We show that this model is equivalent to a larger longitudinal factor model and that the factor scores are predictive of future risk of HIV.