Skip to main content
eScholarship
Open Access Publications from the University of California

Information Distribution Depends on Language-Specific Features

Abstract

Language can be thought of as a code: A system for packaging a speakers thoughts into a signal that a listener mustdecode to recover some intended meaning. If language is a near-optimal code, then speakers should structure informationin their utterances to minimizes the impact of errors in production or comprehension. To examine the distribution ofinformation within utterances, we apply information-theoretic methods to a diverse set of languages in various spoken andwritten corpora. We find reliably non-uniform and cross-linguistically variable information distributions across languages.These distributions are consistent across contexts, and are predictable from typological features, most notably canonicalword order. However, when we include even a small amount of predictive context (bigrams or trigrams), the language-specific shapes disappear, and all languages are characterized by uniform information distribution. Despite cross-linguisticvariability in communicative codes, speakers structure their utterances to preserve uniform information distribution andsupport successful communication.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View