Distributional information, in the form of simple, lo-
cally computed statistics of an input corpus, provi-
des a potential m e a n s of establishing initial syntac-
tic categories (noun, verb, etc.). Finch and Chater
(1991, 1992) clustered words hierarchically, accor-
ding to the distribution of locad contexts in which
they appeared in large, written English corpora,
obtaining clusters that corresponded well with the
standard syntactic categories. Here, a stronger de-
monstration of their method is provided, using 'real'
data, that to which children are exposed during ca-
tegory acquisition, taken from the childes corpus.
For 2-5 million words of aulult speech, clustering on
syntsu:tic and semantic bases was observed, with a
high degree of cleai differentiation between syntac-
tic categories. For child data, s o m e noun and verb
clusters emerged, with s o m e evidence of other ca-
tegories, but the data set was too small for reliable
trends to emerge. S o m e initial results investigating
the possibility of classifying novel words using only
the immediate context of a single instance are also
presented. These results demonstrate that statisti-
cal information m a y play an important role in the
processes of early language 2u:quisition.