- Main
Automatic Estimation of Lexical Concreteness in 77 Languages
Abstract
We estimate lexical Concreteness for millions of wordsacross 77 languages. Using a simple regression framework,we combine vector-based models of lexical semantics withexperimental norms of Concreteness in English and Dutch.By applying techniques to align vector-based semantics acrossdistinct languages, we compute and release Concreteness esti-mates at scale in numerous languages for which experimentalnorms are not currently available. This paper lays out thetechnique and its efficacy. Although this is a difficult datasetto evaluate immediately, Concreteness estimates computedfrom English correlate with Dutch experimental norms at ρ= .75 in the vocabulary at large, increasing to ρ = .8 amongNouns. Our predictions also recapitulate attested relationshipswith word frequency. The approach we describe can be readilyapplied to numerous lexical measures beyond Concreteness.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-