Speculative Parallel Execution for Local Timestepping
Published Web Location
https://dl.acm.org/doi/10.1145/3437959.3459257?sid=SCITRUSAbstract
Currently, synchronous timestepping for fluid and plasma simulations requires selection of a global time step that conservatively satisfies stability conditions everywhere. However, this approach causes substantial unnecessary work in the presence of large variations of element sizes or local wavespeeds. Local timestepping can significantly reduce work by allowing subdomains to take steps according to local rather than global stability constraints. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the submesh and its neighbors, dependencies become irregular and may dynamically change as neighbors take smaller or larger timesteps. Furthermore, coarsening and refining timesteps introduces dynamic load imbalance. In order to correctly resolve these dependencies in a distributed setting, we parallelize the local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to eliminate misspeculation when dependencies can be identified early, and present a semi-static load balancing strategy to improve scalability. We present detailed performance characterizations of event overheads, misspeculation, and scalability of our approach. Our numerical experiments demonstrate up to a 2.8x speedup versus a baseline unoptimized approach; a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping; and scalability up to 3,072 cores on NERSC Cori's Haswell partition.