Objects can either be represented as independent individuals (“object-files”) or as members of a collection (an “ensemble”). Work over the past 40 years has explored these representational systems, largely in the visual domain. Far less is known about auditory objects. Here, we show that a property characteristic of visual object representation – that it can be modulated by linguistic framing – also applies to auditory objects. In particular, we show that using the expression “each sound” versus “every sound” can bias auditory object construal in the same way that using “each circle” versus “every circle” can bias visual object construal. These findings support the idea that object-files and ensembles are not limited to the visual domain, but are representational formats found more generally throughout cognition.