Predicting Discrete Outcomes with Many Correlated Predictors
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Predicting Discrete Outcomes with Many Correlated Predictors

No data is associated with this publication.
Abstract

This dissertation develops classification methods to deal with correlated predictors.

In Chapter 1, I present a comprehensive introduction and delve into the detailed motivation for the topic.

In Chapter 2, I develop the Factor Adjusted Naive Bayes (FANB) classification method to deal with correlated predictors in classification with Linear Discriminant Analysis. The FANB method de-correlates the highly correlated predictors via learning the latent factors and idiosyncratic components from the factor structure of the predictors, utilizes them as a new predictor set, and applies the Naive Bayes independent rule to the new predictor set for classification.The componentwise FANB AdaBoost classification method is developed for the variable selection scheme to select the best predictors among the weakly correlated predictors of factors and idiosyncratic components. To improve forecast performance for discrete time series outcomes, the Markov Chain property of the outcome variable is integrated to capture the dependence in discrete outcomes by considering the current state class and transition probabilities in estimating prior probabilities of LDA. I illustrate the effectiveness of the methods in Monte Carlo simulations and their application to forecast economic recessions.

In Chapter 3, I develop a new classification method in Quadratic Discriminant Analysis (QDA) to deal with correlated predictors. The proposed method, Factor-Adjusted Naive Bayes QDA, transforms the correlated predictors via a class-dependent factor adjustment procedure and applies the Naive Bayes QDA classification rule to the weakly correlated factors and idiosyncratic components. Apart from the conventional factor model, the class-dependent factor-adjusted procedure allows factor loadings to vary by class and applies PCA to class-specific covariance matrices weighted by a discrete kernel function to estimate factors and factor loadings. I demonstrate the good performance of our new classification method in extensive Monte Carlo simulations and application in forecasting Stock Index directions.

In Chapter 4, I develop a multi-class classification method to tackle the challenges of multi-class classification with highly correlated predictors. The method is the Multi-class Markov Chain Factor Adjusted Naive Bayes AdaBoost (MCFANB Multi-class AdaBoost), which is equivalent to fitting a forward stagewise additive model using a multi-class exponential loss function with Multi-class Markov Chain Factor Adjusted Naive Bayes as the base learner. The method outperforms many other methods in Monte Carlo simulations. An empirical application to forecasting five-state mortgage delinquency in a big data environment demonstrates the merits of the MCFANB Multi-class AdaBoost.

Chapter 5 concludes the dissertation.

Main Content

This item is under embargo until January 31, 2027.