Although the facilitation of visual search by contextual information is well established, there is little understanding of the independent contributions of different types of contextual cues in scenes. Additional work in quantifying the time course of the influence of various contextual cues has not been performed, nor have researchers investigated how cue information is extractable across the visual field. Here we manipulated three types of spatial contextual information: object co-occurrence, multiple object configurations, and scene gist. We measured the spatial informativeness of each cue to target localization and isolated the benefits of each contextual cue to target detectability, its impact on decision bias, and the guidance of eye movements. To assess how cues are combined, we compare observed sensitivity during detection with multiple cues to a theoretical optimal combination of the various cues. We also utilize a novel paradigm where scene viewing time was contingent upon the number of fixations within a scene made by observers. To assess observers’ ability to extract cue information across the visual field, we observed their performance at detecting cues in scenes shown exclusively in the visual periphery.
We find that object-based information guides eye movements and facilitates perceptual judgments more than background information. Despite its relatively weaker influence on search behavior, background information is shown to be most easily extracted across the visual field and likely to support the parsing of multiple object configurations in scenes. Multiple object configuration information specifically is implicated in the guidance of initial search, providing coarse guidance that is later localized further by co-occurring object information. The degree of guidance and facilitation of each contextual cue can be related to its contribution to reducing the spatial uncertainty about the target location as measured by human explicit judgments about likely target locations. Comparison of target detectability across various cue conditions suggests that the performance improvements with multiple cues are consistent with an optimal integration of independent cues.
In addition to exploring influences of spatial cues on visual search task performance, we were also interested in assessing a non-spatial cue’s effect on target detection performance and eye movement guidance. We manipulated the scale of a target object relative to its surroundings and found that observers were significantly worse at detecting mis-scaled targets relative to normally sized targets. Unsurprisingly, this non-spatial cue did not have as dramatic an effect on eye movement guidance as the three spatial cues. However, this emphasizes the importance of considering non-spatial scene information as a possible contextual influencer of human behavior.
Overall, our results improve the understanding of the interplay of distinct contextual scene components and their contributions to search guidance, a landmark behavior that differentiates human and biological vision from basic machine vision. The results are also useful in informing the type of information that might improve computer-based object detection and scene understanding.