Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Computational Models of Visual Attention within a Probabilistic Inference Framework

Abstract

Attention is a well-studied and complex topic that covers many fields of research. Effects of attention are ubiquitous throughout the brain and can be differentiated among sensory modalities (e.g., visual vs. auditory), volition (e.g., exogenous vs. endogenous), and application (e.g., covert vs. overt). Over the past few decades, researchers have proposed many computational models of visual attention, and with the rise of machine learning tools, more have been proposed to solve computer vision problems as well. In this dissertation, I will focus on a specific subset of models that place visual attention within a probabilistic inference framework in which humans utilize attention to infer the current state of the world from noisy sensory information. Across three experiments, I propose and evaluate computational models of visual attention that address endogenous spatial attention, feature and spatial attention during covert visual search, and bottom-up and top-down attention during free viewing of natural images. Each model builds upon the previous one in an effort to understand the influence of common principles across different tasks and applications.

In the first experiment, I propose a computational model of spatial attention that uses a dynamic pooling mechanism to simulate receptive field changes that have been observed in neurophysiological studies of endogenous spatial attention. The model can be viewed as a spatial prior over a region of the visual field that reduces uncertainty in visual processing by enhancing the local spatial resolution. By reproducing well-characterized perceptual phenomena observed in visual crowding literature, we conclude that reduction in the spatial uncertainty of encoded feature representations relieves crowding. This decrease in uncertainty influences crowding mainly by increasing the redundancy of encoded representations, with effects on fidelity playing a more limited role. In the second experiment, I extend this model by incorporating spatial attention into a hierarchical generative model to simulate a covert visual search task for digits among non-digit distractors. The generative model learns top-down priors over digit features, and these priors disambiguate among low-level target and distractor features during search to highlight regions that are likely to contain the target. By spatially attending predicted target locations that were generated with or without the use of top-down priors, we show a benefit of using top-down priors on downstream target classification accuracy that is greater than the improvement from spatial attention alone. Finally, in the third experiment, I introduce a model of bottom-up and top-down attention at multiple levels of feature complexity and spatial scale to account for gaze behavior in a free-viewing experiment across many categories of natural images. As an extension of the second experiment, priors in this experiment influence bottom-up as well as top-down attention. In this case, bottom-up attention is measured as the surprise relative to priors (in an information-theoretic sense) when viewing a scene. By learning priors within as well as across categories, the results demonstrate that surprise from category-specific priors over high-level features best accounted for gaze behavior across the majority of scene types.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View