The rapid advancement of large language models (LLMs) has opened new possibilities across various domains. However, adapting these models for specialized applications remains challenging. This thesis proposes several innovative fine-tuning strategies to address these challenges, particularly focusing on data scarcity, task alignment, and interpretability and control in specialized domains. These strategies include: 1) Customizing loss functions to reflect task-specific priorities. 2) Incorporating specialized embeddings and additional prediction heads combined with Language Modeling (LM) to further align training objectives. 3) Integrating Variational Autoencoders (VAEs) with Transformers to smooth out the embedding space for a better interpretable representation of complex data distributions, enabling the generation of new synthetic data points through sampling from the latent space.
This thesis applies and evaluates these strategies in two critical areas: online health support and cybersecurity applications.In the context of online health support, this thesis develops a novel approach for intent detection in smoking cessation support groups, addressing challenges of noisy, sparse data and overlapping intents. By fine-tuning an LLM with customized loss functions that account for intent priorities and class imbalances, this approach achieves a 95.5% accuracy across 24 intent categories, significantly outperforming existing approaches for online health group intent detection.
For online mental health support conversations, this thesis proposes an innovative model for generating empathetic, counseling-style reflections. Through the integration of multiple text-based empathetic factors, including emotion, intent, and three mechanisms for communicating empathy, this model enhances both context and response representations. This approach significantly improves the quality of reflective, empathetic responses, as validated through automated metrics and human evaluations. This advancement has the potential to augment online mental health support services by providing more nuanced and psychologically appropriate responses.
In the cybersecurity domain, this thesis presents WebTransVAE, a Transformer-based Variational Autoencoder for generating synthetic web traffic data. This model uniquely combines the strengths of Transformers in handling sequential data with VAEs' ability to create a structured latent space. By implementing techniques such as pooling, denoising, and encoder pretraining, this approach mitigates the issue of posterior collapse. Empirical evaluations demonstrate that the model successfully generates four key variables of HTTP web traffic data through latent manipulation, maintaining a validity rate of 99.6% on average for each generated variable.
These strategies collectively demonstrate significant improvements in adapting LLMs for specialized tasks. In online health support, the enhanced models show a better understanding of user needs and context, generating higher quality counseling responses by better integrating nuances of empathetic reflective listening, potentially leading to more effective interventions. In cybersecurity, the approach enables the generation of more realistic and varied synthetic data, crucial for robust system testing and development.The methodologies presented in this thesis address core challenges of data scarcity, task alignment, and interpretability and control, establishing a framework for adapting language models to diverse specialized domains. This research offers potential solutions for other applications facing similar limitations, paving the way for more effective AI utilization in critical real-world scenarios.