- Main
ACCURACY AND PRIVACY IN SPEECH-BASED MODELING OF MAJOR DEPRESSION: INNOVATIVE APPROACHES THROUGH DATA AUGMENTATION, AND SPEAKER IDENTITY DISENTANGLEMENT
- Ravi, Vijay
- Advisor(s): Alwan, Abeer
Abstract
Major Depressive Disorder (MDD) is a prevalent mental illness that affects a significant portion of the global population. Despite its severity, traditional diagnostic methods often fail to identify and treat MDD effectively, highlighting the need for automated diagnostic tools. Recent research has identified speech signals as promising biomarkers for objectively detecting depression. However, the development of speech-based depression detection systems faces several challenges including data scarcity and privacy preservation. The sensitive nature of mental health data makes it difficult to collect large datasets required for training robust models. Moreover, many current approaches rely on features that can compromise patient confidentiality, hindering the adoption of these systems in clinical settings. This thesis presents novel methods to address these challenges and to enhance the performance and privacy of speech-based depression detection. The contributions include a frame rate-based data augmentation technique (FrAUG) to increase training data while preserving depression-related acoustic information. Additionally, five speaker identity disentanglement methods are proposed: adversarial loss maximization, loss equalization via Cross-Entropy, Variance, and KL Divergence, and unsupervised speaker disentanglement via cosine similarity minimization. These methods aim to reduce the reliance on speaker identity during depression detection. The proposed techniques are evaluated on multiple datasets in two languages - English (DAIC-WoZ dataset) and Mandarin (EATD and CONVERGE datasets), demonstrating improved depression detection accuracy and reduced speaker separability compared to state-of-the-art approaches. Furthermore, the privacy preservation capabilities of these methods are quantified using gain of voice distinctiveness and de-identification scores, showcasing their potential for safeguarding patient privacy. By advancing speech-based depression detection in terms of accuracy and privacy, this thesis aims to facilitate the development of effective and secure diagnostic tools that can be readily adopted in clinical settings.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-