Skip to main content
eScholarship
Open Access Publications from the University of California

Do Humans Look Where Deep Convolutional Neural Networks “Attend”?

Abstract

Convolutional Neural Networks (CNNs) have recently begunto exhibit human level performance on some visual percep-tion tasks. Performance remains relatively poor on vision taskslike object detection. We hypothesized that this gap is largelydue to the fact that humans exhibit selective attention, whilemost object detection CNNs have no corresponding mecha-nism. We investigated some well-known attention mechanismsin the deep learning literature, identifying their weaknessesand leading us to propose a novel CNN approach to objectdetection: the Densely Connected Attention Model. We thenmeasured human spatial attention, in the form of eye trackingdata, during the performance of an analogous object detectiontask. By comparing the learned representations produced byvarious CNNs with that exhibited by human viewers, we iden-tified some relative strengths and weaknesses of the examinedattention mechanisms. The resulting comparisons provide in-sights into the relationship between CNN object detection sys-tems and the human visual system.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View