Self-rated health is an independent predictor of future health outcomes, including morbidity and mortality. Therefore, in a public health context, for high proportions of populations to report very good or excellent subjective health is in itself an important end point. To achieve this, public health interventions need to be informed by knowledge of the determinants of both illness and wellbeing in different groups. In this study, I used self-rated health as the outcome measure, and studied the characteristics of individuals in a population who rated their health as excellent or very good (classed as `good' self rated health), versus those who rated it as only good, fair, or poor (classed as `poor' self-rated health). A broad range of risk and protective multi-domain determinants of health were included in the analysis as predictor variables. The study data were drawn from the CARDIA study, a United States cohort started in 1985 to investigate the development of coronary artery disease risk factors in a young adult population.
In the first analysis, I utilized classification tree methods to segment the study sample of 3649 individuals, to identify subgroups with some shared characteristics and relatively homogenous self-rated health status. Lifestyle, social and community influences, and living and working conditions were all associated with self-rated health. Combinations of these factors differed by population subgroup. Physical activity rating emerged as the most important variable in the single tree classification, and the model suggested interaction of lifestyle and medical factors with socioeconomic factors, income and education.
In the second analysis, the study sample was first divided into subsets based on total family income. I investigated the characteristics associated with self-rated health within each income subset using classification trees. The findings suggested a social gradient for several health determinants. The proportion of good self-rated health increased with higher income category, and the proportion of poor self-rated health decreased. Within population subgroups stratified by income, the combinations of factors that were associated with self-rated health, and the predictor variable that ranked as most important relative to self-rated health differed. This is suggestive of potentially important differences in the factors that are responsible for self-rated health and health inequalities among different income groups that have dissimilar social, cultural and economic contexts.
The third analysis extended the single classification tree analysis with the application of random forests. This method produced an ensemble of classification trees, which improved accuracy and produced more robust variable importance measures. Despite the inclusion of a wide range of predictor variables representing fixed factors, lifestyle and medical conditions, social and community influences, and living and working conditions, the model selected education and income as the highest-ranking variables associated with self-rated health in the study sample. This highlights the importance of addressing social determinants of health and inequities.
This dissertation contributes to the literature on the determinants of self-rated health, and adds a novel application of classification tree analysis and random forests methods to the study of self-rated health. Capturing the complex interplay of factors affecting health in populations can be difficult with parametric multivariate regression. These models may not capture the full array of variables influencing health. Recursive partitioning methods can serve as an initial tool to suggest population subgroups that might have homogenous risks of an outcome, and identify the relative importance of risk and protective factors in population subgroups for further inquiry. This knowledge is valuable in developing appropriate and targeted public health interventions that focus on specific needs.