Search

Article
Peer Reviewed

Language use is only sparsely compositional: The case of English adjective-noun phrases in humans and large language models

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 46 (2024)

Compositionality is considered a key hallmark of human language. However, most research focuses on item-level compositionality, e.g., to what extent the meanings of phrases are composed of the meanings of their sub-parts, rather than on language-level compositionality, which is the degree to which possible combinations are utilized in practice during language use. Here, we propose a novel way to quantify the degree of language-level compositionality and apply it in the case of English adjective-noun combinations. Using corpus analyses, large language models, and human acceptability ratings, we find that (1) English only sparsely utilizes the compositional potential of adjective‚Äìnoun combinations; and (2) LLMs struggle to predict human acceptability judgments of rare combinations. Taken together, our findings shed new light on the role of compositionality in language and highlight a challenging area for further improving LLMs.

Cover page: Language use is only sparsely compositional: The case of English adjective-noun phrases in humans and large language models

Creative Commons 'BY' version 4.0 license

Article
Peer Reviewed

Probabilistic atlas for the language network based on precision fMRI data from >800 individuals

UCLA Previously Published Works (2022)

Two analytic traditions characterize fMRI language research. One relies on averaging activations across individuals. This approach has limitations: because of inter-individual variability in the locations of language areas, any given voxel/vertex in a common brain space is part of the language network in some individuals but in others, may belong to a distinct network. An alternative approach relies on identifying language areas in each individual using a functional 'localizer'. Because of its greater sensitivity, functional resolution, and interpretability, functional localization is gaining popularity, but it is not always feasible, and cannot be applied retroactively to past studies. To bridge these disjoint approaches, we created a probabilistic functional atlas using fMRI data for an extensively validated language localizer in 806 individuals. This atlas enables estimating the probability that any given location in a common space belongs to the language network, and thus can help interpret group-level activation peaks and lesion locations, or select voxels/electrodes for analysis. More meaningful comparisons of findings across studies should increase robustness and replicability in language research.

Cover page: Probabilistic atlas for the language network based on precision fMRI data from >800 individuals

Article
Peer Reviewed

Conventional and frugal methods of estimating COVID-19-related excess deaths and undercount factors.

UC Davis Previously Published Works (2024)

Across the world, the officially reported number of COVID-19 deaths is likely an undercount. Establishing true mortality is key to improving data transparency and strengthening public health systems to tackle future disease outbreaks. In this study, we estimated excess deaths during the COVID-19 pandemic in the Pune region of India. Excess deaths are defined as the number of additional deaths relative to those expected from pre-COVID-19-pandemic trends. We integrated data from: (a) epidemiological modeling using pre-pandemic all-cause mortality data, (b) discrepancies between media-reported death compensation claims and official reported mortality, and (c) the wisdom of crowds public surveying. Our results point to an estimated 14,770 excess deaths [95% CI 9820-22,790] in Pune from March 2020 to December 2021, of which 9093 were officially counted as COVID-19 deaths. We further calculated the undercount factor-the ratio of excess deaths to officially reported COVID-19 deaths. Our results point to an estimated undercount factor of 1.6 [95% CI 1.1-2.5]. Besides providing similar conclusions about excess deaths estimates across different methods, our study demonstrates the utility of frugal methods such as the analysis of death compensation claims and the wisdom of crowds in estimating excess mortality.

Cover page: Conventional and frugal methods of estimating COVID-19-related excess deaths and undercount factors.