Search

Scholarly Works (6 results)

Sort By:

Article
Peer Reviewed

Can Peanuts Fall in Love with Distributional Semantics?

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 45 (2023)

Context changes expectations about upcoming words—following a story involving an anthropomorphic peanut, comprehenders expect the sentence the peanut was in love more than the peanut was salted, as indexed by N400 amplitude (Nieuwland & van Berkum, 2006). This updating of expectations has been explained using Situation Models—mental representations of a described event. However, recent work showing that N400 amplitude is predictable from distributional information alone raises the question whether situation models are necessary for these contextual effects. We model the results of Nieuwland and van Berkum (2006) using six computational language models and three sets of word vectors, none of which have explicit situation models or semantic grounding. We find that a subset of these can fully model the effect found by Nieuwland and van Berkum (2006). Thus, at least some processing effects normally explained through situation models may not in fact require explicit situation models

Cover page: Can Peanuts Fall in Love with Distributional Semantics?

Creative Commons 'BY' version 4.0 license

Thesis
Peer Reviewed

Understanding the role of statistics in the predictive processing of language

Michaelov, James Asamoah
Advisor(s): Bergen, Benjamin K

UC San Diego Electronic Theses and Dissertations (2024)

In recent years, converging evidence has suggested that prediction plays a role in language comprehension, as it appears to do in information processing in a range of cognitive domains. Much of the evidence for this comes from the N400, a neural index of the processing of meaningful stimuli which has been argued to index the extent to which a word was predicted before it was encountered. The main aim of this thesis is to investigate the extent to which this prediction can be explained as arising from the statistics of the linguistic inputs we receive over the course of our lives, in line with predictive processing in other cognitive domains. To do this, I turn to language models—computational systems that can calculate the probability of a word given its context based on the statistics of language—and investigate how well their predictions correlate with the N400. The results show that probabilities calculated using language models are highly correlated with N400 amplitude, in many cases better than human-derived metrics such as cloze probability and plausibility, previously the best predictors of the N400. I also show that language model probabilities are able to qualitatively model a wide range of effects, showing significant differences based on the same experimental manipulations that lead to significant differences in N400 amplitude. In addition, the results show that language models that are better able to predict the next word in a sequence are better able to model N400 amplitude in both of these ways, showing both a closer fit to the data and more of the qualitative effects. Taken together, these results show a high degree of correlation between the N400 and predictions based on the statistics of language, consistent with the idea that the predictions indexed by the N400 are at least partly based on language statistics.

Cover page: Understanding the role of statistics in the predictive processing of language

Article
Peer Reviewed

Do Large Language Models Know What Humans Know?

UC San Diego Previously Published Works (2023)

Humans can attribute beliefs to others. However, it is unknown to what extent this ability results from an innate biological endowment or from experience accrued through child development, particularly exposure to language describing others' mental states. We test the viability of the language exposure hypothesis by assessing whether models exposed to large quantities of human language display sensitivity to the implied knowledge states of characters in written passages. In pre-registered analyses, we present a linguistic version of the False Belief Task to both human participants and a large language model, GPT-3. Both are sensitive to others' beliefs, but while the language model significantly exceeds chance behavior, it does not perform as well as the humans nor does it explain the full extent of their behavior-despite being exposed to more language than a human would in a lifetime. This suggests that while statistical learning from language exposure may in part explain how humans develop the ability to reason about the mental states of others, other mechanisms are also responsible.

Article
Peer Reviewed

Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 43 (2021)

Despite being designed for performance rather than cognitive plausibility, transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures, such as recurrent neural networks. Based on how well they predict the N400, a neural signal associated with processing difficulty, we propose and provide evidence for one possible explanation—their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.

Cover page: Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

Article
Peer Reviewed

Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects

UC San Diego Previously Published Works (2023)

Abstract Theoretical accounts of the N400 are divided as to whether the amplitude of the N400 response to a stimulus reflects the extent to which the stimulus was predicted, the extent to which the stimulus is semantically similar to its preceding context, or both. We use state-of-the-art machine learning tools to investigate which of these three accounts is best supported by the evidence. GPT-3, a neural language model trained to compute the conditional probability of any word based on the words that precede it, was used to operationalize contextual predictability. In particular, we used an information-theoretic construct known as surprisal (the negative logarithm of the conditional probability). Contextual semantic similarity was operationalized by using two high-quality co-occurrence-derived vector-based meaning representations for words: GloVe and fastText. The cosine between the vector representation of the sentence frame and final word was used to derive contextual cosine similarity estimates. A series of regression models were constructed, where these variables, along with cloze probability and plausibility ratings, were used to predict single trial N400 amplitudes recorded from healthy adults as they read sentences whose final word varied in its predictability, plausibility, and semantic relationship to the likeliest sentence completion. Statistical model comparison indicated GPT-3 surprisal provided the best account of N400 amplitude and suggested that apparently disparate N400 effects of expectancy, plausibility, and contextual semantic similarity can be reduced to variation in the predictability of words. The results are argued to support predictive coding in the human language network.

Creative Commons 'BY-NC-ND' version 4.0 license

Article
Peer Reviewed

Distrubutional Semantics Still Can't Account for Affordances

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 44 (2022)

Can we know a word by the company it keeps? Aspects of meaning that concern physical interactions might be particularly difficult to learn from language alone. Glenberg & Robertson (2000) found that although human comprehenders were sensitive to the distinction between afforded and nonafforded actions, distributional semantic models were not. We tested whether technological advances have made distributional models more sensitive to affordances by replicating their experiment with modern Neural Language Models (NLMs). We found that only one NLM (GPT-3) was sensitive to the affordedness of actions. Moreover, GPT-3 accounted for only one third of the effect of affordedness on human sensibility judgments. These results imply that people use processes that go beyond distributional statistics to understand linguistic expressions, and that NLP systems may need to be augmented with such capabilities.

Cover page: Distrubutional Semantics Still Can't Account for Affordances