We propose a neural network model that accounts for the emer-gence of the taxonomic constraint and for the whole objectconstraint in early word learning. Our proposal is based onMayor and Plunkett (2010)’s neurocomputational model of thetaxonomic constraint and extends it in two directions. Firstly,we deal with realistic visual and acoustic stimuli. Secondly,we model the well-known whole object constraint in the visualcomponent. We show that, despite the augmented input com-plexity, the proposed model compares favorably with respectto previous systems.