Search

Article

Towards an Accurate Performance Modeling of Parallel Sparse Factorization

Lawrence Berkeley National Laboratory (2006)

We present a performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. Our model characterizes the algorithmic behavior bytaking account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the SuperLU_DIST linear system solver, the sparse matrices from real applications, and an IBM POWER3 parallel machine. Our modeling methodology can be easily adapted to study performance of other types of sparse factorizations, such as Cholesky or QR.

Cover page: Towards an Accurate Performance Modeling of Parallel Sparse Factorization

Article

A new scheduling algorithm for parallel sparse LU factorization with static pivoting

Lawrence Berkeley National Laboratory (2002)

In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU_DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.

Cover page: A new scheduling algorithm for parallel sparse LU factorization with
static pivoting

Article

Performance analysis of parallel supernodal sparse LU factorization

LBL Publications (2004)

We investigate performance characteristics for the LU factorization of large matrices with various sparsity patterns. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, making use of static pivoting. We develop a performance model and we validate it using the implementation in SuperLU_DIST, the real matrices and the IBM Power3 machine at NERSC. We use this model to obtain performance bounds on parallel computers, to perform scalability analysis and to identify performance bottlenecks. We also discuss the role of load balance and data distribution in this approach.

Cover page: Performance analysis of parallel supernodal sparse LU
factorization

Article
Peer Reviewed

A 3D Parallel Algorithm for QR Decomposition

LBL Publications (2018)

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.

Cover page: A 3D Parallel Algorithm for QR Decomposition

Article
Peer Reviewed

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

UC Berkeley Previously Published Works (2016)

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the at MPI model on Erd}os{Rffenyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

Article
Peer Reviewed

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication.

UC Berkeley Previously Published Works (2016)

Article

Enhancing Scalability of Sparse Direct Methods

Lawrence Berkeley National Laboratory (2008)