The distribution of fitness effects (DFE) of new mutations plays a fundamental role in evolutionary genetics. However, the extent to which the DFE differs across species has yet to be systematically investigated. Furthermore, the biological mechanisms determining the DFE in natural populations remain unclear. Here, we show that theoretical models emphasizing different biological factors at determining the DFE, such as protein stability, back-mutations, species complexity, and mutational robustness make distinct predictions about how the DFE will differ between species. Analyzing amino acid-changing variants from natural populations in a comparative population genomic framework, we find that humans have a higher proportion of strongly deleterious mutations than Drosophila melanogaster. Furthermore, when comparing the DFE across yeast, Drosophila, mice, and humans, the average selection coefficient becomes more deleterious with increasing species complexity. Last, pleiotropic genes have a DFE that is less variable than that of nonpleiotropic genes. Comparing four categories of theoretical models, only Fisher's geometrical model (FGM) is consistent with our findings. FGM assumes that multiple phenotypes are under stabilizing selection, with the number of phenotypes defining the complexity of the organism. Our results suggest that long-term population size and cost of complexity drive the evolution of the DFE, with many implications for evolutionary and medical genomics.