- Main
Improving Efficiency and Quality of Data Collection with Machine Learning and Citizen Science
- Khan, Fahim Hasan
- Advisor(s): Pang, Alex
Abstract
Working with data is a fundamental and essential aspect of computer science, particularly in machine learning (ML), data science, AI applications, scientific analysis, and decision-making. Efficiency in data collection is crucial, as many scientific investigations, including those in computer science, rely on large volumes of data. Additionally, data quality significantly influences the overall effectiveness and performance of systems and algorithms. Citizen science facilitates public participation in scientific research, contributing to data collection, analysis, and reporting. This dissertation addresses two main challenges in the data collection process: improving efficiency and ensuring data quality. To tackle these challenges, I propose an approach that integrates ML with citizen science to enhance data collection. This synergy can improve data collection efficiency and quality, as ML algorithms assist citizen science participants in accurately identifying relevant data, filtering out label noise, and validating gathered data. Primarily, I focus on the potential of using computer vision ML models to guide and automate the collection process of visual data, such as images and videos. In this dissertation, I introduce a set of systems designed to improve the data collection process, including SmartCS, a platform for creating ML-powered citizen science applications without writing code; RipFinder, a mobile application that uses ML to guide the collection of rip current data; and RipScout, a drone-based system for the automated collection of rip current data. These systems address data quality earlier in the collection pipeline, rather than gathering and cleaning data afterward. Another contribution of my dissertation is engaging the general public in scientific research, demonstrated through my work on involving young students in research through these systems. Overall, my approach and developed systems advance the state of the art in modern data collection processes by uniquely combining citizen science and ML, demonstrating their significance in enhancing data quality and efficiency.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-