Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Previously Published Works bannerUCLA

Utility of word embeddings from large language models in medical diagnosis

Abstract

Objective

This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials and methods

Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space. Euclidean and Chebyshev distance metrics were used to classify symptoms based on their proximity to both the eponymic condition and the ensemble mean of the condition's symptoms.

Results

The ensemble mean approach showed significantly higher classification accuracy, correctly classifying between 80% (Cowden disease) to 100% (Tuberous sclerosis) of the sample disease symptoms using the Euclidean distance metric. In contrast, the eponymic condition approach using Euclidian distance metric and Chebyshev distances, in general, showed poor symptom classification performance, with erratic results (0%-100% accuracy), largely ranging between 0% and 3% accuracy.

Discussion

The ensemble mean captures a disease's collective symptom profile, providing a more nuanced representation than the disease name alone. However, some misclassifications were due to superficial semantic similarities, highlighting the need for LLM models trained on medical corpora.

Conclusion

The ensemble mean of symptom embeddings improves classification accuracy over the eponymic condition approach. Future efforts should focus on medical-specific training of LLMs to enhance their diagnostic accuracy and clinical utility.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View