During speech perception, humans constantly sense multiple sources of information. As so, the McGurk effect has been typically conceived as a prototypical example of language multi-modal integration, specifically, it has been used to study audio-visual
integration during speech. The McGurk effect arises when in-congruent audio-visual stimuli} are paired and perceived as a different syllable ( auditory /ba/ +
visual /ga/ = percept “da” ).
We developed a hierarchical computer model, based on self-organizing maps and Hebbian learning to study speech multi-modal integration. Our architecture allows studying the McGurk effect purely on bottom-up processing of audio-visual information.
We trained several versions of the model and measured the activation similarity between
McGurk alike stimuli and congruent ones by means of mutual information. Our results suggest that the illusory
percept arises from the best congruent representation that reduces uncertainty. Furthermore, the reliability of each sensory modality determines the best congruent representation.