- Main
Data-driven Computing and Analysis with Contrasting Statistical Developments in Real-world Applications
- Liao, Shuting
- Advisor(s): Hsieh, Fushing;
- Paul, Debashis
Abstract
The real data generated from the real-world complex systems in general embraces rather sophisticated deterministic and stochastic structures on multiscale levels. Such structural complexity surely induces very challenging learning problems and poses very difficult data-analyzing issues. Data coming from diverse complex systems studied in scientific fields are often found to have diverse ways of preserving data pattern information. This diversity of ways of encoding information is in part due to the constraints between data’s sophisticated deterministic and stochastic structures. It becomes necessary for data scientists to adapt to such sophisticated constraints by adopting data-driven computing approaches when analyzing data from real-world complex systems. That is, to gain authentic information in data, it is essential to develop data-analysis methodologies according to the data’s intrinsic characteristics. In this dissertation, we develop and propose data-driven adaptive computational methods and statistical frameworks based on specific data structures, including digital images, data on Alzheimer’s Disease as well as limited data on biochemical experiments. In a project of evaluating the effectiveness of chemical spraying through an unmanned aerial vehicle (UAV), we prescribe a computational approach to using color-identification algorithms and minimum spanning trees (MSTs) to analyze the spatial distribution of color dots of various sizes and colors on the image. We succeeded in achieving the goal of testing the evenness of mechanical spray via color-dot testing papers. In a project studying the aging effects on a series of three of Van Gogh’s Sunflowers in a vase, we develop a computational approach to restore the original color and vibrancy in a reverse-engineering fashion. Their already faded or brownish-yellow backgrounds are successfully revived to shed yellow-oriented lights computationally. In a project of analyzing time-to-event data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, we employ conditional entropy to unravel heterogeneity among subjects and evaluate the potential factors that affect the diagnosis of Alzheimer’s disease. Our data-driven results are compared to Cox’s proportional hazard modeling and demonstrate better capability in identifying significant factors. In a contrasting fashion, we also study a statistical problem in modeling biochemical experiments with data being limited in size and scope. Under such constraints, we propose a flexible methodology for analyzing the variability of smooth functionals of the growth or production trajectories associated with temporally measured biochemical processes across different experimental conditions when the amount of data is limited. We demonstrate, through numerical experiments and real data analysis, the effectiveness of the statistical inference of key parameters of interest and the flexibility to extend to correlated structures. We conclude that data-driven approaches are necessary when analyzing big data sets, while statistical modeling has its merit when data is limited.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-