There is growing support for Temporal Difference (TD) Learning
as a formal account of the role of the midbrain dopamine
system and the basal ganglia in learning from reinforcement.
This account is challenged, however, by the fact that realistic
implementations of TD Learning have been shown to fail on
some fairly simple learning tasks — tasks well within the capabilities
of humans and non-human animals. We hypothesize
that such failures do not arise from natural learning systems
because of the ubiquitous appearance of lateral inhibition in
the cortex, producing sparse conjunctive internal representations
that support the learning of predictions of future reward.
We provide support for this conjecture through computational
simulations that compare TD Learning systems with and without
lateral inhibition, demonstrating the benefits of sparse conjunctive
codes for reinforcement learning.