Search

Article
Peer Reviewed

Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping.

UC San Francisco Previously Published Works (2023)

BACKGROUND: Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS: We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS: We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS: Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine.

Cover page: Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping.

Article
Peer Reviewed

EPHB2 germline variants in patients with colorectal cancer or hyperplastic polyposis

UC Davis Previously Published Works (2006)

Background

Ephrin receptor B2 (EPHB2) has recently been proposed as a novel tumor suppressor gene in colorectal cancer (CRC). Inactivation of the gene has been shown to correlate with progression of colorectal tumorigenesis, and somatic mutations have been reported in both colorectal and prostate tumors.

Methods

Here we have analyzed the EPHB2 gene for germline alterations in 101 individuals either with 1) CRC and a personal or family history of prostate cancer (PC), or 2) intestinal hyperplastic polyposis (HPP), a condition associated with malignant degeneration such as serrated adenoma and CRC.

Results

Four previously unknown missense alterations were observed, which may be associated with the disease phenotype. Two of the changes, I361V and R568W, were identified in Finnish CRC patients, but not in over 300 Finnish familial CRC or PC patients or more than 200 population-matched healthy controls. The third change, D861N, was observed in a UK HPP patient, but not in additional 40 UK HPP patients or in 200 UK healthy controls. The fourth change R80H, originally identified in a Finnish CRC patient, was also found in 1/106 familial CRC patients and in 9/281 healthy controls and is likely to be a neutral polymorphism.

Conclusion

We detected novel germline EPHB2 alterations in patients with colorectal tumors. The results suggest a limited role for these EPHB2 variants in colon tumor predisposition. Further studies including functional analyses are needed to confirm this.

Cover page: EPHB2 germline variants in patients with colorectal cancer or hyperplastic polyposis

Article
Peer Reviewed

Refinement of the associations between risk of colorectal cancer and polymorphisms on chromosomes 1q41 and 12q13.13

UC Davis Previously Published Works (2012)

In genome-wide association studies (GWASs) of colorectal cancer, we have identified two genomic regions in which pairs of tagging-single nucleotide polymorphisms (tagSNPs) are associated with disease; these comprise chromosomes 1q41 (rs6691170, rs6687758) and 12q13.13 (rs7163702, rs11169552). We investigated these regions further, aiming to determine whether they contain more than one independent association signal and/or to identify the SNPs most strongly associated with disease. Genotyping of additional sample sets at the original tagSNPs showed that, for both regions, the two tagSNPs were unlikely to identify a single haplotype on which the functional variation lay. Conversely, one of the pair of SNPs did not fully capture the association signal in each region. We therefore undertook more detailed analyses, using imputation, logistic regression, genealogical analysis using the GENECLUSTER program and haplotype analysis. In the 1q41 region, the SNP rs11118883 emerged as a strong candidate based on all these analyses, sufficient to account for the signals at both rs6691170 and rs6687758. rs11118883 lies within a region with strong evidence of transcriptional regulatory activity and has been associated with expression of PDGFRB mRNA. For 12q13.13, a complex situation was found: SNP rs7972465 showed stronger association than either rs11169552 or rs7136702, and GENECLUSTER found no good evidence for a two-SNP model. However, logistic regression and haplotype analyses supported a two-SNP model, in which a signal at the SNP rs706793 was added to that at rs11169552. Post-GWAS fine-mapping studies are challenging, but the use of multiple tools can assist in identifying candidate functional variants in at least some cases.

Cover page: Refinement of the associations between risk of colorectal cancer and polymorphisms on chromosomes 1q41 and 12q13.13

Article
Peer Reviewed

Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33

UC Davis Previously Published Works (2010)

Genome-wide association studies (GWAS) have identified ten loci harboring common variants that influence risk of developing colorectal cancer (CRC). To enhance the power to identify additional CRC risk loci, we conducted a meta-analysis of three GWAS from the UK which included a total of 3,334 affected individuals (cases) and 4,628 controls followed by multiple validation analyses including a total of 18,095 cases and 20,197 controls. We identified associations at four new CRC risk loci: 1q41 (rs6691170, odds ratio (OR) = 1.06, P = 9.55 × 10⁻¹⁰ and rs6687758, OR = 1.09, P = 2.27 × 10⁻⁹, 3q26.2 (rs10936599, OR = 0.93, P = 3.39 × 10⁻⁸), 12q13.13 (rs11169552, OR = 0.92, P = 1.89 × 10⁻¹⁰ and rs7136702, OR = 1.06, P = 4.02 × 10⁻⁸) and 20q13.33 (rs4925386, OR = 0.93, P = 1.89 × 10⁻¹⁰). In addition to identifying new CRC risk loci, this analysis provides evidence that additional CRC-associated variants of similar effect size remain to be discovered.

Cover page: Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33

Article
Peer Reviewed

Multiple Common Susceptibility Variants near BMP Pathway Loci GREM1, BMP4, and BMP2 Explain Part of the Missing Heritability of Colorectal Cancer

UC Davis Previously Published Works (2011)

Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10(-10)) and BMP2 (rs4813802, P = 4.65×10(-11)). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10(-8)) and rs11632715 (P = 2.30×10(-10)). As low-penetrance predisposition variants become harder to identify-owing to small effect sizes and/or low risk allele frequencies-approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.

Cover page: Multiple Common Susceptibility Variants near BMP Pathway Loci GREM1, BMP4, and BMP2 Explain Part of the Missing Heritability of Colorectal Cancer

Article
Peer Reviewed

Pan-cancer analysis of whole genomes

UCLA Previously Published Works (2020)

Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale^1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter⁴; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation^5,6; analyses timings and patterns of tumour evolution⁷; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity^8,9; and evaluates a range of more-specialized features of cancer genomes^8,10-18.

Cover page: Pan-cancer analysis of whole genomes