When integrating information in real time from multiplemodalities or sources, such as when navigating with the helpof GPS voice instructions along with a visual map, a decision-maker is faced with a difficult cue integration problem. Thetwo sources, in this case visual and spoken, have potentiallyvery different interpretations or presumed reliability. Whenmaking decisions in real time, how do we combine cues com-ing from visual and linguistic evidence sources? In a sequenceof three studies we asked participants to navigate through aset of virtual mazes using a head-mounted virtual reality dis-play. Each maze consisted of a series of T intersections, ateach of which the subject was presented with a visual cue and aspoken cue, each separately indicating which direction to con-tinue through the maze. However the two cues did not alwaysagree, forcing the subject to make a decision about which cueto “trust.” Each type of cue had a certain level of reliability(probability of providing correct guidance), independent fromthe other cue. Subjects learned over the course of trials howmuch to follow each cue, but we found that they generallytrusted spoken cues more than visual ones, notwithstandingthe objectively matched reliability levels. Finally, we showhow subjects’ tendency to favor the spoken cue can be mod-eled as a Bayesian prior favoring trusting such sources morethan visual ones.