Everyday interactions often depend on thinking about spaceand time: collaborators need to know where events takeplace – and in what order – to, e.g., communicate drivingdirections, build pieces of furniture, or carry out strategicoperations in military and sports settings (Núñez &Cooperrider, 2013). A simple set of driving directions mayrequire a listener to interpret and reason about the spatialrelations – such as next to and behind – and the temporalrelations – such as after and during – that a speakerdescribes. The speaker may also use gestures to substitute,supplement, or disambiguate linguistic descriptions (Holle& Gunter, 2007; Perzanowski, Schultz, & Williams, 1998).Such rapid, rich, and productive interactions are transientand difficult to analyze behaviorally, and so they pose achallenge for experimenters. They are grounded in thephysical world, and accordingly challenge computationalmodels that cannot digest rich perceptual and environmentalinput in real time. Robotic systems are geared towardsprocessing and acting upon the physical world – and theyincreasingly support human-robotic interaction (e.g., Fonget al., 2006; Kawamura et al., 2003; Kortenkamp et al.,1999). But they, too, are uniquely challenged in maintainingproductive interactive exchanges with human teammates,because they must be tolerant of human idiosyncrasies,preferences, limitations, and errors (Trafton et al., 2013).Because these challenges cut across broad interests incognitive science – such as linguistics, artificial intelligence,robotics, and psychology – progress is unlikely without theengagement of multiple approaches, from psychologicalexperimentation to the construction of autonomous,embodied systems. In recent years, progress towardsunderstanding interactive spatiotemporal cognition hasaccelerated along parallel paths: there exist new behavioraland imaging methodologies to study event segmentation(e.g., Radvansky & Zacks, 2014), spatial inference (e.g.,Knauff & Ragni, 2013), and gestural cognition (e.g.,Novack et al., 2016); novel computational theories ofunderstanding physical reasoning (e.g., Battaglia et al.,2013) and mental simulation (e.g., Khemlani & Johnson-Laird, 2013); cognitive architectures that support richinteractivity (Huffman & Laird, 2014; Trafton et al., 2013);and a wide variety of technological platforms on which totransform theory into embodied interaction.The goal of the workshop is to allow these parallelapproaches to converge. Discussants will share recent dataand theory, consider novel architectural approaches, anddemonstrate burgeoning technological advances thatadvance the science of spatiotemporal inference. Theworkshop will promote interdisciplinary collaboration byfocusing on three unifying themes