Learning complex robot manipulation policies for real-world objects is extremely challenging, and often requires significant tuning within controlled environments. In this thesis, we learn an And-Or graph-based model to execute tasks with highly variable structure and multiple stages, which are typically not suitable for most policy learning approaches. The model is learned from human demonstration using a tactile glove that measures both hand pose and contact forces. The tactile glove enables observation of visually latent changes in the scene, specifically the forces imposed to unlock safety mechanisms. From these observations a stochastic grammar model is learned to represent the compositional nature of the task sequence, and the compatibility of that sequence with the observed tactile feedback. We present a method for transferring this human-specific knowledge onto a robot platform, and demonstrate that the robot can perform successful manipulations of unseen objects with similar task structure.