Generative modeling promises an elegant solution to learning about high-dimensional data distributions such as images and videos --- but how can we expose and utilize the rich structure these models discover? Rather than just drawing new samples, how can an agent actually harness p(x) as a source of knowledge about how our world works? This thesis explores scalable inductive biases that unlock a generative model's understanding of the entities latent in visual data, enabling much richer interaction with the model as a result.