Search

Scholarly Works (2 results)

Article
Peer Reviewed

Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer.

UC Irvine Previously Published Works (2023)

Introduction

The ability to accurately predict whether a woman will develop breast cancer later in her life, should reduce the number of breast cancer deaths. Different predictive models exist for breast cancer based on family history, BRCA status, and SNP analysis. The best of these models has an accuracy (area under the receiver operating characteristic curve, AUC) of about 0.65. We have developed computational methods to characterize a genome by a small set of numbers that represent the length of segments of the chromosomes, called chromosomal-scale length variation (CSLV).

Methods

We built machine learning models to differentiate between women who had breast cancer and women who did not based on their CSLV characterization. We applied this procedure to two different datasets: the UK Biobank (1534 women with breast cancer and 4391 women who did not) and the Cancer Genome Atlas (TCGA) 874 with breast cancer and 3381 without.

Results

We found a machine learning model that could predict breast cancer with an AUC of 0.836 95% CI (0.830.0.843) in the UK Biobank data. Using a similar approach with the TCGA data, we obtained a model with an AUC of 0.704 95% CI (0.702, 0.706). Variable importance analysis indicated that no single chromosomal region was responsible for significant fraction of the model results.

Conclusion

In this retrospective study, chromosomal-scale length variation could effectively predict whether or not a woman enrolled in the UK Biobank study developed breast cancer.

Cover page: Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer.

Thesis
Peer Reviewed

Cancer Risk Determination through Chromosomal Scale Length Variations of Germline DNA

Ko, Charmeine Shumeng
Advisor(s): Brody, James

UC Irvine Electronic Theses and Dissertations (2022)

Cancer is a complex disease with significant genetic components. Previous efforts to uncover the genetic basis of carcinogenesis tend to focus on linear combinations of single genetic mutations, ignoring the complex non-linear network of interactions that are known to regulate cellular processes. The goal of this line of research is the ability to predict whether a person will develop a specific cancer later in their life.This study evaluates how well machine learning classification algorithms trained with germline chromosomal scale length variation (CSLV) data from cancer patients can predict whether a person will develop cancer later in life. CSLVs were developed to condense pertinent copy number variation (CNV) information into a smaller number of parameters, allowing the usage of machine learning models. We investigated cancer risk prediction and diagnosis classification from germline CSLV data alone. Our findings indicate that CSLVs contribute to inherited cancer likelihood through a complicated network interaction. We first tested 33 different types of cancer using the 11,000 patients from the Cancer Genome Atlas (TCGA). Lung squamous cell carcinoma (AUC = 0.69), Glioblastoma multiforme (AUC = 0.78), colon adenocarcinoma (AUC = 0.67), and many others could be differentiated from other cancer types better than random chance. We also evaluated the method in a second dataset, the UK Biobank. Each cancer type dataset was paired with an age- and gender-matched randomized control set. 125 CSLVs were computed, 4 averages and 1 standard deviation from each of the 22 autosomes and 3 sex chromosomes (X, Y, and XY), to be used as features in the model. The AUC of lung cancer was found to be 0.597, the AUC of brain cancer was 0.567, and the AUC of colorectal cancer was 0.565. These results were comparable to current published risk scores and demonstrate the viability of CSLVs as genetic risk scores for certain cancer types. Utilizing germline chromosomal scale length variation data from large public databases and machine learning models, we developed a novel and promising method to predict cancer diagnosis. This technique can be further improved and augmented for more clinical relevance, and it can be beneficial in personalized diagnostics and cancer preventive measures.

Cover page: Cancer Risk Determination through Chromosomal Scale Length Variations of Germline DNA