Visual perception is a fundamental task of computer vision. Subtasks within perception can be decomposed into two types: reasoning about the generative process of images or phenomena themselves (i.e., a prior on the sensory input we expect to perceive) as well as discriminating high-level structural information contained within them. In this dissertation, we first explore how this decomposition is not as decoupled as it might appear. In particular, we show how the act of discrimination is sufficient to produce a model of generative ability. Furthermore, while vision algorithms are required to discriminate about visible aspects of a scene, it is often useful to reason about what cannot be seen within a 2D observation. We explore a practical method for eliciting so-called 2.1D information on two popular, large-scale datasets and demonstrate how this generative information can improve certain scene-level segmentation tasks. Next, we explicitly build a model of scene layout which relies on an amodal understanding (both what can and cannot be seen) of so-called "stuff" (background classes) to accurately place "things'" (objects) in a scene. Finally, we revisit the de facto method of instance segmentation within the modern age of computer vision — binary mask prediction — and question whether other approaches (i.e., boundary-based regression) can be suitable alternatives. We validate for the first time that continuous, boundary-based regression can match mask-based prediction with respect to a variety of notions of parity alongside numerous advantages in differentiability and sparsity compared to it. We believe this opens up further research into the segmentation algorithms of the future.