Search

Scholarly Works (32 results)

Sort By:

Show:

Article
Peer Reviewed

The anatomy of a distributed predictive modeling framework: online learning, blockchain network, and consensus algorithm

Kuo, Tsung-Ting

UC San Diego Previously Published Works (2020)

Objective

Cross-institutional distributed healthcare/genomic predictive modeling is an emerging technology that fulfills both the need of building a more generalizable model and of protecting patient data by only exchanging the models but not the patient data. In this article, the implementation details are presented for one specific blockchain-based approach, ExplorerChain, from a software development perspective. The healthcare/genomic use cases of myocardial infarction, cancer biomarker, and length of hospitalization after surgery are also described.

Materials and methods

ExplorerChain's 3 main technical components, including online machine learning, metadata of transaction, and the Proof-of-Information-Timed (PoINT) algorithm, are introduced in this study. Specifically, the 3 algorithms (ie, core, new network, and new site/data) are described in detail.

Results

ExplorerChain was implemented and the design details of it were illustrated, especially the development configurations in a practical setting. Also, the system architecture and programming languages are introduced. The code was also released in an open source repository available at https://github.com/tsungtingkuo/explorerchain.

Discussion

The designing considerations of semi-trust assumption, data format normalization, and non-determinism was discussed. The limitations of the implementation include fixed-number participating sites, limited join-or-leave capability during initialization, advanced privacy technology yet to be included, and further investigation in ethical, legal, and social implications.

Conclusion

This study can serve as a reference for the researchers who would like to implement and even deploy blockchain technology. Furthermore, the off-the-shelf software can also serve as a cornerstone to accelerate the development and investigation of future healthcare/genomic blockchain studies.

Cover page: The anatomy of a distributed predictive modeling framework: online learning, blockchain network, and consensus algorithm

Article
Peer Reviewed

Generalizable prediction of COVID-19 mortality on worldwide patient data

UC San Diego Previously Published Works (2022)

Objective

Predicting Coronavirus disease 2019 (COVID-19) mortality for patients is critical for early-stage care and intervention. Existing studies mainly built models on datasets with limited geographical range or size. In this study, we developed COVID-19 mortality prediction models on worldwide, large-scale "sparse" data and on a "dense" subset of the data.

Materials and methods

We evaluated 6 classifiers, including logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), AdaBoost (AB), and Naive Bayes (NB). We also conducted temporal analysis and calibrated our models using Isotonic Regression.

Results

The results showed that AB outperformed the other classifiers for the sparse dataset, while LR provided the highest-performing results for the dense dataset (with area under the receiver operating characteristic curve, or AUC ≈ 0.7 for the sparse dataset and AUC = 0.963 for the dense one). We also identified impactful features such as symptoms, countries, age, and the date of death/discharge. All our models are well-calibrated (P > .1).

Discussion

Our results highlight the tradeoff of using sparse training data to increase generalizability versus training on denser data, which produces higher discrimination results. We found that covariates such as patient information on symptoms, countries (where the case was reported), age, and the date of discharge from the hospital or death were the most important for mortality prediction.

Conclusion

This study is a stepping-stone towards improving healthcare quality during the COVID-19 era and potentially other pandemics. Our code is publicly available at: https://doi.org/10.5281/zenodo.6336231.

Cover page: Generalizable prediction of COVID-19 mortality on worldwide patient data

Article
Peer Reviewed

CertificateChain: decentralized healthcare training certificate management system using blockchain and smart contracts

UC San Diego Previously Published Works (2022)

Objective

Managing training certificates is an important issue in research that can lead to serious issues if not addressed properly. For institutions that currently do not have a dedicated management system for these training certificates, a central database is the most typical solution. However, such a system suffers from several risks, such as a single-point-of-failure.

Materials and methods

To address this issue, we developed and evaluated CertificateChain, a decentralized training certificate management system by using peer-to-peer blockchain and automated smart contracts. We developed an efficient certificate dividing-and-merging algorithm to overcome the transaction size limit on blockchain.

Results

We performed experiments on the system to evaluate its performance, then created a web app and tested the system in a real-world scenario. CertificateChain scaled linearly in terms of time compared with the total number of certificates added and could be quickly queried for existing data stored on-chain.

Discussion

CertificateChain was able to store and retrieve the training certificates on the blockchain network, with limitations including a comparative analysis of other systems, evaluation of different consensus protocols, examining certificates off-chain, a thorough comparison with a centralized system, and the extension to the main public Ethereum network.

Conclusion

We believe that these results indicate that blockchain technology could be a viable decentralized alternative to traditional databases in this use case. Our software is publicly available at: https://doi.org/10.5281/zenodo.6257094.

Cover page: CertificateChain: decentralized healthcare training certificate management system using blockchain and smart contracts

Article
Peer Reviewed

Detecting model misconducts in decentralized healthcare federated learning

UC San Diego Previously Published Works (2022)

Background

To accelerate healthcare/genomic medicine research and facilitate quality improvement, researchers have started cross-institutional collaborations to use artificial intelligence on clinical/genomic data. However, there are real-world risks of incorrect models being submitted to the learning process, due to either unforeseen accidents or malicious intent. This may reduce the incentives for institutions to participate in the federated modeling consortium. Existing methods to deal with this "model misconduct" issue mainly focus on modifying the learning methods, and therefore are more specifically tied with the algorithm.

Basic procedures

In this paper, we aim at solving the problem in an algorithm-agnostic way by (1) designing a simulator to generate various types of model misconduct, (2) developing a framework to detect the model misconducts, and (3) providing a generalizable approach to identify model misconducts for federated learning. We considered the following three categories: Plagiarism, Fabrication, and Falsification, and then developed a detection framework with three components: Auditing, Coefficient, and Performance detectors, with greedy parameter tuning.

Main findings

We generated 10 types of misconducts from models learned on three datasets to evaluate our detection method. Our experiments showed high recall with low added computational cost. Our proposed detection method can best identify the misconduct on specific sites from any learning iteration, whereas it is more challenging to precisely detect misconducts for a specific site and at a specific iteration.

Principal conclusions

We anticipate our study can support the enhancement of the integrity and reliability of federated machine learning on genomic/healthcare data.

Cover page: Detecting model misconducts in decentralized healthcare federated learning

Article
Peer Reviewed

Quorum-based model learning on a blockchain hierarchical clinical research network using smart contracts

UC San Diego Previously Published Works (2023)

Background

Collaborative privacy-preserving modeling across several healthcare institutions allows for the construction of more generalizable predictive models while protecting patient privacy.

Objective

We aim at addressing the site availability issue on a hierarchical network by designing an immutable/transparent/source-verifiable quorum mechanism.

Methods

We developed an approach to combine a hierarchical learning algorithm, a novel Proof-of-Quorum (PoQ) consensus protocol, and a design of blockchain smart contracts. We constructed QuorumChain as an example and evaluated the scenarios of site-unavailability during the initialization and/or iteration phases of the modeling process on three healthcare/genomic datasets.

Results

When one or more sites would become unavailable, HierarchicalChain could not function, whereas QuorumChain improved predictive correctness significantly (the full Area Under the receiver operating characteristic Curve, or AUC, improved from 0.068 to 0.441, all with p-values < 0.001).

Conclusion

By constructing a quorum to continue the modeling process, QuorumChain possesses the capability to tackle the situation of sites being unavailable. It inherits the capability of learning on network-of-networks, improves learning continuity, and provides data/software immutability, transparency, and provenance, which can be important in expediting clinical research.

Cover page: Quorum-based model learning on a blockchain hierarchical clinical research network using smart contracts

Thesis
Peer Reviewed

Strengthening Health Research Workflow for Research-Oriented Sharing, Predictive Modeling, and Cross-Institutional Collaboration

Pham, Anh
Advisor(s): Kuo, Tsung-Ting

UC San Diego Electronic Theses and Dissertations (2024)

Machine learning and artificial intelligence (AI) hold the promise to innovate clinical practices and to improve quality of care. Naturally, the health research workflow may require modern adaptations to best capitalize the fast growth of AI. One challenge along the pipeline from health data to AI products is the balance between privacy-focused patients and access-focused researchers. It is thus crucial for scientists to overcome sharing barriers of health-research activities without doing harm to data security and privacy, so as to enable the meaningful use of electronic health records (EHR) in research. In particular, the sharing of research-oriented activities to benefit both patients and researchers can be facilitated through establishing a secured, blockchain-based informed consent infrastructure with which patients can grant data access to researchers. Other processes such as the management of clinical research activities in data consortia may also take advantage of such platforms. Following this feasibility of a robust, prolific data pipeline, the next step in health AI is the use of EHR for predictive analysis. Specifically, the COVID-19 pandemic has cast light on the critical value of predictive modeling in the fight against infectious diseases. A meaningful demonstration of EHR use in AI may include the construction of models to predict the risk of Clostridioides difficile infection. Last but not least, the efficiency and usability of cross-institutional research collaboration may also stand to benefit from workflow improvement, as it may enable better use of data from multiple sources. For instance, the credential-verification procedure to which researchers must abide to access external datasets may enhance efficiency through secured automation. Similarly, the use of privacy-preserving algorithms can be promoted by providing non-technical users with intuitive tools to participate in federated learning schemas. Together, it is seen that the health research workflow can be bolstered through fortifying the sharing framework of research-oriented activities, meaningfully utilizing EHR in predictive analysis, and improving the efficiency and usability of cross-institutional research. Such developments may boost scientific growth by keeping the right equilibrium among data quantity, data privacy, and research efficiency and usability, permitting the simultaneous expansion of patients’ autonomy and researchers’ innovation.

Article
Peer Reviewed

Privacy-preserving model learning on a blockchain network-of-networks

UC San Diego Previously Published Works (2020)

Objective

To facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.

Materials and methods

We propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.

Results

HierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.

Discussion

HierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.

Conclusion

We demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction.

Cover page: Privacy-preserving model learning on a blockchain network-of-networks

Article
Peer Reviewed

NLM’s sponsorship of research in biomedical informatics (1985–2016)

UC San Diego Previously Published Works (2022)

The U.S. National Library of Medicine's (NLM) funding for biomedical informatics research in the 1980s and 1990s focused on clinical decision support systems, which were also the focus of research for Donald A.B. Lindberg M.D. prior to becoming NLM's director. The portfolio of projects expanded over the years. At NLM, Dr. Lindberg supported various large infrastructure programs that enabled biomedical informatics research, as well as investigator-initiated research projects that increasingly included biotechnology/bioinformatics and health services research. The authors review NLM's sponsorship of research during Dr. Lindberg's tenure as its Director. NLM's funding significantly increased in the 2000's and beyond. Authors report an analysis of R01 topics from 1985-2016 using data from NIH RePORTER. Dr. Lindberg's legacy for biomedical informatics research is reflected by the research NLM supported under his leadership. The number of R01s remained steady over the years, but the funds provided within awards increased over time. A significant amount of NLM funds listed in RePORTER went into various types of infrastructure projects that laid a solid foundation for biomedical informatics research over multiple decades.

Cover page: NLM’s sponsorship of research in biomedical informatics (1985–2016)

Article
Peer Reviewed

Previewable Contract-Based On-Chain X-Ray Image Sharing Framework for Clinical Research

UC San Diego Previously Published Works (2021)

Background

An image sharing framework is important to support downstream data analysis especially for pandemics like Coronavirus Disease 2019 (COVID-19). Current centralized image sharing frameworks become dysfunctional if any part of the framework fails. Existing decentralized image sharing frameworks do not store the images on the blockchain, thus the data themselves are not highly available, immutable, and provable. Meanwhile, storing images on the blockchain provides availability/immutability/provenance to the images, yet produces challenges such as large-image handling, high viewing latency while viewing images, and software inconsistency while storing/loading images.

Objective

This study aims to store chest x-ray images using a blockchain-based framework to handle large images, improve viewing latency, and enhance software consistency.

Basic procedures

We developed a splitting and merging function to handle large images, a feature that allows previewing an image earlier to improve viewing latency, and a smart contract to enhance software consistency. We used 920 publicly available images to evaluate the storing and loading methods through time measurements.

Main findings

The blockchain network successfully shares large images up to 18 MB and supports smart contracts to provide code immutability, availability, and provenance. Applying the preview feature successfully shared images 93% faster than sharing images without the preview feature.

Principal conclusions

The findings of this study can guide future studies to generalize our framework to other forms of data to improve sharing and interoperability.

Cover page: Previewable Contract-Based On-Chain X-Ray Image Sharing Framework for Clinical Research

Article
Peer Reviewed

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks

UC San Diego Previously Published Works (2018)

Cross-institutional healthcare predictive modeling can accelerate research and facilitate quality improvement initiatives, and thus is important for national healthcare delivery priorities. For example, a model that predicts risk of re-admission for a particular set of patients will be more generalizable if developed with data from multiple institutions. While privacy-protecting methods to build predictive models exist, most are based on a centralized architecture, which presents security and robustness vulnerabilities such as single-point-of-failure (and single-point-of-breach) and accidental or malicious modification of records. In this article, we describe a new framework, ModelChain, to adapt Blockchain technology for privacy-preserving machine learning. Each participating site contributes to model parameter estimation without revealing any patient health information (i.e., only model data, no observation-level data, are exchanged across institutions). We integrate privacy-preserving online machine learning with a private Blockchain network, apply transaction metadata to disseminate partial models, and design a new proof-of-information algorithm to determine the order of the online learning process. We also discuss the benefits and potential issues of applying Blockchain technology to solve the privacy-preserving healthcare predictive modeling task and to increase interoperability between institutions, to support the Nationwide Interoperability Roadmap and national healthcare delivery priorities such as Patient-Centered Outcomes Research (PCOR).