Model-free (MF) and model-based (MB) reinforcement learn-ing (RL) have provided a successful framework for under-standing both human behavior and neural data. These two sys-tems are usually thought to compete for control of behavior.However, it has also been proposed that they can be integratedin a cooperative manner. For example, the Dyna algorithm usesMB replay of past experience to train the MF system, and hasinspired research examining whether human learners do some-thing similar. Here we introduce an approach that links MFand MB learning in a new way: via the reward function. Givena model of the learning environment, dynamic programmingis used to iteratively approximate state values that monotoni-cally converge to the state values under the optimal decisionpolicy. Pseudorewards are calculated from these values andused to shape the reward function of a MF learner in a waythat is guaranteed not to change the optimal policy. We showthat this method offers computational advantages over Dyna intwo classic problems. It also offers a new way to think aboutintegrating MF and MB RL: that our knowledge of the worlddoesn’t just provide a source of simulated experience for train-ing our instincts, but that it shapes the rewards that those in-stincts latch onto. We discuss psychological phenomena thatthis theory could apply to, including moral emotions.