Skip to main content
eScholarship
Open Access Publications from the University of California

Temporal event clustering in speech versus music

Abstract

Both speech and music can be organized as hierarchical, nested groupings of units. In speech, for instance, phonemescan group to form syllables, which group to form words, which group to form sentences, and so on. In music, notes can groupto form phrases, which group to form chord progressions, which group to form verses, and so on. We present a new methodfor extracting events (amplitude peaks in Hilbert envelopes of filter banks) from speech and music recordings, and quantifyingthe degree of nesting in temporal clusters of events across timescales (using Allan Factor analysis). We apply this method tomonologue recordings of speech (TED talks) and also to solo musical performances of similar lengths. We found that bothtypes of recordings exhibit nested clustering, revealing similar organizational principles, but that clustering is more pronouncedon shorter timescales (milliseconds) for speech, but longer timescales (seconds+) for music.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View