Expectation, or prediction, has become a major theme in cognitive science. Music offers a powerful system for studying how expectations are formed and deployed in the processing of richly structured sequences that unfold rapidly in time. We ask to what extent expectations about an upcoming note in a melody are driven by two distinct factors: Gestalt-like principles grounded in the auditory system (e.g.a preference for subsequent notes to move in small intervals), and statistical learning of melodic structure. We use multinomial regression modeling to evaluate the predictions of computationally implemented models of melodic expectation against behavioral data from a musical cloze task, in which participants hear a novel melodic opening and are asked to sing the note they expect to come next. We demonstrate that both Gestalt-like principles and statistical learning contribute to listeners' online expectations. In conjunction with results in the domain of language, our results point to a larger-than-previously-assumed role for statistical learning in predictive processing across cognitive domains, even in cases that seem potentially governed by a smaller set of theoretically motivated rules. However, we also find that both of the models tested here leave much variance in the human data unexplained, pointing to a need for models of melodic expectation that incorporate underlying hierarchical and/or harmonic structure. We propose that our combined behavioral (melodic cloze) and modeling (multinomial regression) approach provides a powerful method for further testing and development of models of melodic expectation.