Vibrational sum-frequency generation (VSFG), a second-order nonlinear optical signal, has traditionally been used to study molecules at interfaces as a spectroscopy technique with a spatial resolution of ~100 µm. However, the spectroscopy is not sensitive to the heterogeneity of a sample. To study mesoscopically heterogeneous samples, we, along with others, pushed the resolution limit of VSFG spectroscopy down to ~1 µm level and constructed the VSFG microscope. This imaging technique not only can resolve sample morphologies through imaging, but also record a broadband VSFG spectrum at every pixel of the images. Being a second-order nonlinear optical technique, its selection rule enables the visualization of non-centrosymmetric or chiral self-assembled structures commonly found in biology, materials science, and bioengineering, among others. In this article, the audience will be guided through an inverted transmission design that allows for imaging unfixed samples. This work also showcases that VSFG microscopy can resolve chemical-specific geometric information of individual self-assembled sheets by combining it with a neural network function solver. Lastly, the images obtained under brightfield, SHG, and VSFG configurations of various samples briefly discuss the unique information revealed by VSFG imaging.