Advances in next-generation sequencing coupled with comparative analyses have hadtremendous implication for oncology. Differentially expressed genomic features have revealed
molecular pathways, oncogenic drivers, and resistance patterns targeted by drug development
leading to significant decreases in cancer mortality. The introduction of high throughput multiomics, including transcriptomics, epigenomics, and proteomics, promised to scale translational
innovation. Increasing feature complexity, however, cannot be comparatively resolved, and many
efforts in this space have failed due to inconclusive or conflicting results. Computational systems
biology and functional genomics have proposed dynamic integration of multiple molecular
pathway models for data harmonization, yet the requirement for complete information has biased
discovery. Similarly, the incorporation of probabilistic approaches that constrain features may
obscure incremental biologic effects. This problem is exemplified by the several dozen cancer
genomic biomarkers and models from peer-reviewed high impact publications that do not meet
statistical significance when applied beyond their training and validation datasets.
This dissertation seeks to employ a scientifically rigorous process of evaluation using iterativemodeling to benchmark multiomic comparisons. First, this research develops a framework for
characterizing multimodal data based on structural and functional information. Second, this
research benchmarks similarity metrics using varied data structures, offering techniques to reveal
key biologic differences using probabilistic modeling (hierarchical clustering). This framework is
then applied to neoepitope prediction incorporating human leukocyte antigen-B supertypes and
used to resolve previously inconclusive and conflicting results, including the difference in survival
based on B44 supertype in patients with non-small cell lung cancer (NSCLC) and melanoma
treated with immune checkpoint blockade (ICB).
Ultimately, this dissertation advances our understanding of the interaction between featureselection and power in multiomic analyses and offers recommendations to enhance the reliability
of these investigations.