An associatve network was trained on a speech recognition task using continuous speech. The input speech was processed to produce a spectral representation incorporating some of the transformations introduced by the peripheral auditory system before the signal reaches the brain. Input nodes to the network represented a 150-millIsecond time window through which the transformed speech passed in 2-millisecond steps. Output nodes represented elemental speech sounds (demisyllables) whose target values were specified based on a human listener's ability to identify the sounds in the same input segment. The work reported here focuses on the experience and train on conditions needed to produce natural generalizations between training and test utterances.