- Yang, Xiaoxu;
- Xu, Xin;
- Breuss, Martin W;
- Antaki, Danny;
- Ball, Laurel L;
- Chung, Changuk;
- Shen, Jiawei;
- Li, Chen;
- George, Renee D;
- Wang, Yifan;
- Bae, Taejeong;
- Cheng, Yuhe;
- Abyzov, Alexej;
- Wei, Liping;
- Alexandrov, Ludmil B;
- Sebat, Jonathan L;
- Gleeson, Joseph G
Mosaic variants (MVs) reflect mutagenic processes during embryonic development and environmental exposure, accumulate with aging and underlie diseases such as cancer and autism. The detection of noncancer MVs has been computationally challenging due to the sparse representation of nonclonally expanded MVs. Here we present DeepMosaic, combining an image-based visualization module for single nucleotide MVs and a convolutional neural network-based classification module for control-independent MV detection. DeepMosaic was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated MVs and 530 independent biologically tested MVs from 16 genomes and 181 exomes. DeepMosaic achieved higher accuracy compared with existing methods on biological data, with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18). DeepMosaic represents an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods.