Search

Scholarly Works (2 results)

Thesis
Peer Reviewed

Robust and Interpretable Predictions for Multimodal Sensor Systems

Jeyakumar, Jeya Vikranth
Advisor(s): Srivastava, Mani B

UCLA Electronic Theses and Dissertations (2022)

Smart IoT devices, smartphones, and wearables are penetrating every aspect of our daily lives. These devices are equipped with various sensing modalities, including video, audio, inertial sensors, lidars, etc., that enable multiple sensing applications. Research has shown that rather than operating each sensor in isolation, combining information from multiple sensing streams boosts performance. This method is known as multimodal sensor fusion and Human Activity Recognition(HAR) is one of the applications that benefit from using multiple sensors. In recent years, deep learning algorithms have been shown to achieve high accuracies in HAR using multimodal sensor data. However, in order to design a reliable HAR system, the following challenges still need to be addressed. The first challenge is the heterogeneity of the sensing devices. This arises as the set of devices monitoring a person may vary over time or the devices might have different sampling frequencies. And the second challenge is deep neural networks (DNNs) are considered black boxes because studying their structure often provides little to no insight into the actual underlying mechanics. It is hard to look "into" the network and ascertain why the model selects specific features over others during training, thereby making the predictions from the DNNs not trustworthy to the end-users. This lack of trust prevents the adoption of DNN models in health-related applications and other high-stakes applications where sensitive decisions mandate a sufficient accompanying explanation. Therefore, this dissertation proposes methods to generate accurate predictions robust to the heterogeneity of devices by making opportunistic use of information from available devices and providing human-understandable explanations accompanying each prediction to the end-users.

First, we propose a solution to address the challenges related to the heterogeneity in sensor devices for activity recognition in our work 'SenseHAR.'We design a scalable deep learning-based solution in which each device learns its own sensor fusion model that maps the raw sensor values to a shared low dimensional latent space which we call the `SenseHAR'-a virtual activity sensor. The virtual sensor has the same format and behavior regardless of the subset of devices, sensors availability, sampling rate, or device location. \emph{SenseHAR} helps machine learning engineers to develop their application-specific (e.g., from gesture recognition to activities of daily life) models in a hardware-agnostic manner based on this virtual activity sensor.

Next, we address the issue of explainability for activity recognition in deep learning models. We first identify the most preferred post-hoc explanation technique for classification tasks across different modalities from an end-user perspective. To this end, we conducted a large-scale Amazon Mechanical Turk study comparing the popular state-of-the-art explanation methods to determine which are better for explaining model decisions empirically. Our results show that Explanation by examples was the most preferred type of Explanation. We also offer an open-source library \emph{ExMatchina}, providing a readily available and widely applicable implementation of explanation-by-examples. Then, we focus on interpretable DNN models, especially models that provide concept-based explanations. We proposed \emph{CoDEx}, an automatic Concept Discovery and Extraction module that identifies a rich set of complex concepts from natural language explanations of videos--obviating the need to predefine the amorphous set of concepts. Finally, we introduce \emph{XCHAR}, an Explainable Complex Human Activity Recognition model that accurately predicts complex activities and provides explanations in the form of human-understandable temporal concepts.

Cover page: Robust and Interpretable Predictions for Multimodal Sensor Systems

Article
Peer Reviewed

X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model.

UCLA Previously Published Works (2023)

End-to-end deep learning models are increasingly applied to safety-critical human activity recognition (HAR) applications, e.g., healthcare monitoring and smart home control, to reduce developer burden and increase the performance and robustness of prediction models. However, integrating HAR models in safety-critical applications requires trust, and recent approaches have aimed to balance the performance of deep learning models with explainable decision-making for complex activity recognition. Prior works have exploited the compositionality of complex HAR (i.e., higher-level activities composed of lower-level activities) to form models with symbolic interfaces, such as concept-bottleneck architectures, that facilitate inherently interpretable models. However, feature engineering for symbolic concepts-as well as the relationship between the concepts-requires precise annotation of lower-level activities by domain experts, usually with fixed time windows, all of which induce a heavy and error-prone workload on the domain expert. In this paper, we introduce X-CHAR , an eXplainable Complex Human Activity Recognition model that doesnt require precise annotation of low-level activities, offers explanations in the form of human-understandable, high-level concepts, while maintaining the robust performance of end-to-end deep learning models for time series data. X-CHAR learns to model complex activity recognition in the form of a sequence of concepts. For each classification, X-CHAR outputs a sequence of concepts and a counterfactual example as the explanation. We show that the sequence information of the concepts can be modeled using Connectionist Temporal Classification (CTC) loss without having accurate start and end times of low-level annotations in the training dataset-significantly reducing developer burden. We evaluate our model on several complex activity datasets and demonstrate that our model offers explanations without compromising the prediction accuracy in comparison to baseline models. Finally, we conducted a mechanical Turk study to show that the explanations provided by our model are more understandable than the explanations from existing methods for complex activity recognition.

Cover page: X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model.

Creative Commons 'BY' version 4.0 license