Computational models of semantics have emerged as
powerful tools for natural language processing. Recent
work has developed models to handle compositionality,
but these models have typically been evaluated on large,
uncontrolled corpora. In this paper, we constructed
a controlled set of phrase pairs and collected phrase
similarity judgments, revealing novel insights into hu-
man semantic representation. None of the computa-
tional models that we considered were able to capture
the pattern of human judgments. The results of a sec-
ond experiment, using the same stimuli with a trans-
formational judgment task, support a transformational
account of similarity, according to which the similarity
between phrases is inversely related to the number of ed-
its required to transform one mental model into another.
Taken together, our results indicate that popular mod-
els of compositional semantics do not capture important
facets of human semantic representation.