Performance in estimating the depth and shape of an ellipse on the basis of stereo, motion, and vergence angle information was compared for three models of visual depth cue combination. The three models were a weak model (strict modularity, with no interaction between motion and stereo cues), a modified weak model (restricted interaction allowed between motion and stereo cues), and a strong model (unconstrained interaction between all visual cues). Results are that the modified weak model performed best overall indicating that its structure, which contains both modular and interactive features, has advantages over both the extreme modular organization of the weak model and the extreme interactive organization of the strong model. In addition, the different weighting of motion and stereo cues by the modified weak model in the depth and shape judgment tasks provides a motivation for multiple visual representations of three-dimensional space.