Spatial transcriptomics enables the simultaneous measurement of morphological features and transcriptional profiles of the same cells or regions in tissues. Here we present multi-modal structured embedding (MUSE), an approach to characterize cells and tissue regions by integrating morphological and spatially resolved transcriptional data. We demonstrate that MUSE can discover tissue subpopulations missed by either modality as well as compensate for modality-specific noise. We apply MUSE to diverse datasets containing spatial transcriptomics (seqFISH+, STARmap or Visium) and imaging (hematoxylin and eosin or fluorescence microscopy) modalities. MUSE identified biologically meaningful tissue subpopulations and stereotyped spatial patterning in healthy brain cortex and intestinal tissues. In diseased tissues, MUSE revealed gene biomarkers for proximity to tumor region and heterogeneity of amyloid precursor protein processing across Alzheimer brain regions. MUSE enables the integration of multi-modal data to provide insights into the states, functions and organization of cells in complex biological tissues.