Reading to young, pre-literate children is associated with better language and reading outcomes, but the underlying
mechanisms are poorly understood. The goal of this work is to better understand the potential mechanisms. We hypothesized
that vocabulary diversity and sentence complexity might vary between picture books and child-directed speech, and we wanted
to quantify those potential differences. We built a corpus consisting of the text of 100 picture books that caregivers might
read to pre-literate children. We compared the distributions of vocabulary and certain complex sentences of that corpus to
child-directed speech from the CHILDES corpus. We found that picture books contained a higher number of unique word
types for a given number of tokens, and contained a higher proportion of complex sentences. The mechanisms by which shared
book reading may contribute to improved language outcomes is by exposing children to words and sentence structures that they
would not encounter otherwise.