The number and quality of large-scale structure (LSS) surveys is already stretching current methods of learning about cosmology from data to their capacity. However, the pace of data streaming in will increase significantly, with, at the time of writing, data from the Dark Energy Spectroscopic Instrument (DESI), Euclid, Rubin, Spectro-Photometer for the History of the Universe, Epoch of Reionization and Ices Explorer (SPHEREx), Spec-S5, and Roman joining existing data from the Baryon Oscillation Spectroscopic Survey (BOSS), Dark Energy Survey (DES), and Kilo Degree Survey (KiDS) as well as CMB lensing data. Maximally extracting cosmological information from these datasets depends, of course, on high-fidelity understanding of the instruments and observational effects involved in generating the data. But more fundamentally, the cosmological information accessed with these surveys is generated by non-linear, non-perturbative, and high-dynamic range physical processes.In the face of highly-constraining data, accurate and precise models for such processes are necessarily complex. This generates several challenges for using them to access cosmological information. However, recent advances in high-performance computing have enabled more massive and high-resolution cosmological numerical simulations than ever before. Simultaneously, the growth of Graphics Processing Units (GPUs) and related tooling, such as methods of automatic differentiation, have led to an explosion of machine learning architectures and methods for working with high dimensional models and data, with concomitant application of said methods to cosmological problems by the LSS community. These computational advances have been, and will continue to be, drafted into the service of extracting cosmological information from high-quality data. This thesis highlights several areas where recent technological developments can be directly translated into improved methods for large-scale structure simulation and analysis.
Increasingly complex models generate additional parameters, which, though typically not of cosmological interest, must be included and varied - leading to a more challenging inference problem and, frequently, to less interpretable phenomenological models of LSS. One machine-learning-informed strategy to address this issue in the context of simulation-based prior assumptions is outlined in Chapter 2, while a more direct strategy for speeding up the initial phase of the inference procedure using machine learning methods in a more general inference context is the subject of Chapter 5. More straightforwardly, raw computational cost also grows when numerical models are asked to describe a wider range of scales, which will be necessary for high-density LSS tracer samples covering large spatial volumes and redshift ranges. Chapter 3 details a scheme for improving numerical simulation efficiency, therefore reducing this growing computational cost in the context of modeling the cosmological impact of massive neutrinos on LSS. Numerical simulations that attempt to model galaxy formation in a cosmological context are also increasingly being used to inform LSS tracer properties. As these simulations become more robust in their determination of tracer population properties, strategies for leveraging these properties will enhance accessible cosmological information from surveys. An example for performing such leveraging with a machine learning-based strategy in the context of primordial non-Gaussianity is outlined in Chapter 4. Extending methods in similar directions going forward will enable the LSS community to learn the most possible from large-scale structure survey data.