- Chen, Binbin;
- Khodadoust, Michael S;
- Olsson, Niclas;
- Wagar, Lisa E;
- Fast, Ethan;
- Liu, Chih Long;
- Muftuoglu, Yagmur;
- Sworder, Brian J;
- Diehn, Maximilian;
- Levy, Ronald;
- Davis, Mark M;
- Elias, Joshua E;
- Altman, Russ B;
- Alizadeh, Ash A
Accurate prediction of antigen presentation by human leukocyte antigen (HLA) class II molecules would be valuable for vaccine development and cancer immunotherapies. Current computational methods trained on in vitro binding data are limited by insufficient training data and algorithmic constraints. Here we describe MARIA (major histocompatibility complex analysis with recurrent integrated architecture; https://maria.stanford.edu/ ), a multimodal recurrent neural network for predicting the likelihood of antigen presentation from a gene of interest in the context of specific HLA class II alleles. In addition to in vitro binding measurements, MARIA is trained on peptide HLA ligand sequences identified by mass spectrometry, expression levels of antigen genes and protease cleavage signatures. Because it leverages these diverse training data and our improved machine learning framework, MARIA (area under the curve = 0.89-0.92) outperformed existing methods in validation datasets. Across independent cancer neoantigen studies, peptides with high MARIA scores are more likely to elicit strong CD4+ T cell responses. MARIA allows identification of immunogenic epitopes in diverse cancers and autoimmune disease.