Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Previously Published Works bannerUC Berkeley

A Computational Approach to Identifying Cultural Keywords Across Languages

Published Web Location

https://doi.org/10.1111/cogs.13402
No data is associated with this publication.
Creative Commons 'BY' version 4.0 license
Abstract

Distinctive aspects of a culture are often reflected in the meaning and usage of words in the language spoken by bearers of that culture. Keywords such as душа (soul) in Russian, hati (heart) in Indonesian and Malay, and gezellig (convivial/cosy/fun) in Dutch are held to be especially culturally revealing, and scholars have identified a number of such keywords using careful linguistic analyses (Peeters, 2020b; Wierzbicka, 1990). Because keywords are expected to have different statistical properties than related words in other languages, we argue that a quantitative comparison of word usage across languages can help to identify cultural keywords. To support this claim, we describe a computational method that compares word frequencies across languages, and apply it to both linguistic corpora and word association data. The method identifies culturally specific words that range from "obvious" examples, such as Amsterdam in Dutch, to non-obvious yet independently proposed examples, such as hati (heart) in Indonesian. We show in addition that linguistic corpora and word association data provide converging evidence about culturally specific words. Our results therefore show how computational analyses and behavioral experiments can supplement the methods previously used by linguists to identify culturally salient words across languages.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item