Simulating the long-term dynamics of multi-scale and multi-physics systems
poses a significant challenge in understanding complex phenomena across science
and engineering. The complexity arises from the intricate interactions between
scales and the interplay of diverse physical processes. Neural operators have
emerged as promising models for predicting such dynamics due to their
flexibility and computational efficiency. However, they often fail to
effectively capture multi-scale interactions or quantify the uncertainties
inherent in the predictions. These limitations lead to rapid error
accumulation, particularly in long-term forecasting of systems characterized by
complex and coupled dynamics. To address these challenges, we propose a
spatio-temporal Fourier transformer (StFT), in which each transformer block is
designed to learn dynamics at a specific scale. By leveraging a structured
hierarchy of StFT blocks, the model explicitly captures dynamics across both
macro- and micro- spatial scales. Furthermore, a generative residual correction
mechanism is integrated to estimate and mitigate predictive uncertainties,
enhancing both the accuracy and reliability of long-term forecasts. Evaluations
conducted on three benchmark datasets (plasma, fluid, and atmospheric dynamics)
demonstrate the advantages of our approach over state-of-the-art ML methods.