Sensors in daily life span applications from healthcare and climate modeling to home automation and robotics. These sensors generate abundant time series data, aiding our understanding of various real-life processes. However, much of this data is “unlabeled”, that is, without any annotations as to what the data itself refers to in an application. Also, unlike images or text, time series data is challenging for humans to easily interpret and label. Unlike identifying, say objects in a picture, time-series data often requires human experts to analyze using tools and interpret such data before any useful labels can be attached to the data. This is expensive, especially in real-time settings. It is also not scalable due to costs and scarcity of human annotators. Further, labeling data involving human subjects can compromise privacy and security, resulting in a further shortage of labeled data.
Self-supervised learning, that is algorithms that learn from unlabeled data, has been used to address the label scarcity problem by automatically generating pseudo-labels from the data itself. However, these algorithms are suboptimal when applied to sensor data because they have not been specifically adapted or customized for the sensory domain.
In this dissertation, we will develop methods to adapt traditional self-supervised learning algorithms for sensory domain-specific information to mitigate label scarcity issues. Our approach consists of three steps that can be applied progressively with increased effectiveness. First, we integrate time-interval information of unlabeled data into self-supervised algorithms to build a pre-trained model. Next, we fine-tune the pre-trained model by incorporating application-specific knowledge into a self-supervised algorithm that improves the fine-tuning process. Finally, we propose a sensor context-aware self-supervised algorithm to enhance classical fine-tuning that generalizes to novel classes during testing.
We conduct extensive experiments across various sensory data domains, including Motion, Audio, Electroencephalogram (EEG), and Human Activity Recognition (HAR), comparing our methods to leading statistical and deep learning models. By adapting self-supervised algorithms to sensory data with time-awareness, task-specificity, and sensor context-awareness, our methods improve few-shot learning by 10%, fully-supervised learning by 3.6%, and zero-shot learning by 20% compared to the best baselines. Our framework demonstrates state-of-the-art performance across sensing systems of various scales, from small-scale personal healthcare monitoring, human action recognition, and smart home automation to large-scale smart building control, smart city planning, climate modeling, and beyond.