Skip to main content
eScholarship
Open Access Publications from the University of California

Language Models Show Within- and Cross-language Similarities in Concrete Noun Meaning, but not Differences Between L1 and L2 English Speakers

Creative Commons 'BY' version 4.0 license
Abstract

Monolingual and bilingual speakers of the same languages derive unique meanings for words, partly based on between-language differences in meaning of dictionary-translated words. Do language models also capture this variability between speakers? We compared several models of lexical semantic representation and their correspondences to a word-word meaning similarity rating task done by both L1 and L2 English speakers. We found most language models do not differently correlate with L1 vs. L2 English speakers. Further, these models exhibit more cross-language similarity between Mandarin and English representations than is supported by psycholinguistic research. Only GloVe and OpenAI’s Davinci models more strongly correlated with L1 speakers than L2, but individual participants’ similarity to these models did not relate to language history variables that might otherwise predict bilingual lexical semantic native-likeness. We concluded that language models are not yet reliable references for tracking lexical semantic learning and discuss future directions for computational and psycholinguistics.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View