While current deep learning systems excel at tasks such asobject classification, language processing, and gameplay, fewcan construct or modify a complex system such as a tower ofblocks. We hypothesize that what these systems lack is a “re-lational inductive bias”: a capacity for reasoning about inter-object relations and making choices over a structured descrip-tion of a scene. To test this hypothesis, we focus on a task thatinvolves gluing pairs of blocks together to stabilize a tower,and quantify how well humans perform. We then introducea deep reinforcement learning agent which uses object- andrelation-centric scene and policy representations and apply itto the task. Our results show that these structured represen-tations allow the agent to outperform both humans and morena ̈ıve approaches, suggesting that relational inductive bias isan important component in solving structured reasoning prob-lems and for building more intelligent, flexible machines.