As data of all kinds becomes more readily available, there is an increasing demand for algorithms that can jointly process data from multiple sources, also called modalities. In the current state-of-the-art the are many such algorithms, but for the most part each method is made with a specific application in mind. In this work we aim to create more general and data-driven methods for various multimodal machine learning problems. We approach this difficult problem from the standpoint of graph methods, as graph representations are historically robust to many possible data formats, while maintaining sufficient information to produce state-of-the-art results.
The first problem considered here is a segmentation problem in the case of co-registered, multimodal datasets. Here we perform data fusion on the level of graph representations, specifically concentrating on finding and using the unique information that each modality may bring to the overall scene. From the fused graph we implement standard image segmentation techniques.
We also consider here a matching problem between arbitrary datasets. Once again, graph representations are used to preserve relevant topological information while filtering out specific formatting details. We then match graph nodes using spectral information. By using the Nyström extension eigensolver to quickly calculate approximate graph eigenfunctions, and by using a hierarchical matching algorithm narrow the matching search space, we obtain a runtime and space complexity that is, to the best of our knowledge, superior among the state-of-the-art.