Humans are surrounded by dynamic, continuous streams of stimuli, yet the human mind segments these stimuli and organizes them into discrete event units. Theories of language production assume that segmenting and construing an event provides a starting point for speaking about the event (Levelt, 1989; Konopka & Brown-Schmidt, 2018). However, the precise units of event representation and their mapping to language remain elusive. In this work, we examine event unit formation in linguistic and conceptual event representations. Given cross-linguistic differences in motion event encoding (satellite vs. verb-framed languages), we investigate the extent to which such differences in forming linguistic motion event units affect how speakers of different languages form cognitive event units in non-linguistic tasks. We test English (satellite-framed) and Turkish (verb-framed) speakers on verbal and non-verbal motion event tasks. Our results show that speakers do not rely on the same event unit representations when verbalizing motion vs. identifying motion event units in non-verbal tasks. Therefore, we suggest that conceptual and linguistic event representations are related but distinct levels of event structure.