Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of VAN-DAMME: GPU-accelerated and symmetry-assisted quantum optimal control of multi-qubit systems

VAN-DAMME: GPU-accelerated and symmetry-assisted quantum optimal control of multi-qubit systems

(2025)

We present an open-source software package, VAN-DAMME (Versatile Approaches to Numerically Design, Accelerate, and Manipulate Magnetic Excitations), for massively-parallelized quantum optimal control (QOC) calculations of multi-qubit systems. To enable large QOC calculations, the VAN-DAMME software package utilizes symmetry-based techniques with custom GPU-enhanced algorithms. This combined approach allows for the simultaneous computation of hundreds of matrix exponential propagators that efficiently leverage the intra-GPU parallelism found in high-performance GPUs. In addition, to maximize the computational efficiency of the VAN-DAMME code, we carried out several extensive tests on data layout, computational complexity, memory requirements, and performance. These extensive analyses allowed us to develop computationally efficient approaches for evaluating complex-valued matrix exponential propagators based on Padé approximants. To assess the computational performance of our GPU-accelerated VAN-DAMME code, we carried out QOC calculations of systems containing 10 - 15 qubits, which showed that our GPU implementation is 18.4× faster than the corresponding CPU implementation. Our GPU-accelerated enhancements allow efficient calculations of multi-qubit systems, which can be used for the efficient implementation of QOC applications across multiple domains. Program summary: Program Title: VAN-DAMME CPC Library link to program files:: https://doi.org/10.17632/zcgw2n5bjf.1 Licensing provisions: GNU General Public License 3 Programming language: C++ and CUDA Nature of problem: The VAN-DAMME software package utilizes GPU-accelerated routines and new algorithmic improvements to compute optimized time-dependent magnetic fields that can drive a system from a known initial qubit configuration to a specified target state with a large (≈1) transition probability. Solution method: Quantum control, GPU acceleration, analytic gradients, matrix exponential, and gradient ascent optimization.

Cover page of A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

(2025)

We present the GPU implementation efforts and challenges of the sparse solver package STRUMPACK. The code is made publicly available on github with a permissive BSD license. STRUMPACK implements an approximate multifrontal solver, a sparse LU factorization which makes use of compression methods to accelerate time to solution and reduce memory usage. Multiple compression schemes based on rank-structured and hierarchical matrix approximations are supported, including hierarchically semi-separable, hierarchically off-diagonal butterfly, and block low rank. In this paper, we present the GPU implementation of the block low rank (BLR) compression method within a multifrontal solver. Our GPU implementation relies on highly optimized vendor libraries such as cuBLAS and cuSOLVER for NVIDIA GPUs, rocBLAS and rocSOLVER for AMD GPUs and the Intel oneAPI Math Kernel Library (oneMKL) for Intel GPUs. Additionally, we rely on external open source libraries such as SLATE (Software for Linear Algebra Targeting Exascale), MAGMA (Matrix Algebra on GPU and Multi-core Architectures), and KBLAS (KAUST BLAS). SLATE is used as a GPU-capable ScaLAPACK replacement. From MAGMA we use variable sized batched dense linear algebra operations such as GEMM, TRSM and LU with partial pivoting. KBLAS provides efficient (batched) low rank matrix compression for NVIDIA GPUs using an adaptive randomized sampling scheme. The resulting sparse solver and preconditioner runs on NVIDIA, AMD and Intel GPUs. Interfaces are available from PETSc, Trilinos and MFEM, or the solver can be used directly in user code. We report results for a range of benchmark applications, using the Perlmutter system from NERSC, Frontier from ORNL, and Aurora from ALCF. For a high frequency wave equation on a regular mesh, using 32 Perlmutter compute nodes, the factorization phase of the exact GPU solver is about 6.5× faster compared to the CPU-only solver. The BLR-enabled GPU solver is about 13.8× faster than the CPU exact solver. For a collection of SuiteSparse matrices, the STRUMPACK exact factorization on a single GPU is on average 1.9× faster than NVIDIA’s cuDSS solver.

Cover page of Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

(2024)

GPUs are used in many settings to accelerate large-scale scientific computation, including simulation, computational biology, and molecular dynamics. However, optimizing codes to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This paper shows that an automated GPU program optimization tool, GEVO, can leverage evolutionary computation to find code edits that reduce the runtime of three important applications, multiple sequence alignment, agent-based simulation and molecular dynamics codes, by 28.9%, 29%, and 17.8% respectively. The paper presents an in-depth analysis of the discovered optimizations, revealing that (1) several of the most important optimizations involve significant epistasis, (2) the primary sources of improvement are application-specific, and (3) many of the optimizations generalize across GPU architectures. In general, the discovered optimizations are not straightforward even for a GPU human expert, showcasing the potential of automated program optimization tools to both reduce the optimization burden for human domain experts and provide new insights for GPU experts.

Cover page of Photoinduced Electron–Nuclear Dynamics of Fullerene and Its Monolayer Networks in Solvated Environments

Photoinduced Electron–Nuclear Dynamics of Fullerene and Its Monolayer Networks in Solvated Environments

(2024)

The recently synthesized monolayer fullerene network in a quasi-hexagonal phase (qHP-C60) exhibits superior electron mobility and optoelectronic properties compared to molecular fullerene (C60), making it highly promising for a variety of applications. However, the microscopic carrier dynamics of qHP-C60 remain unclear, particularly in realistic environments, which are of significant importance for applications in optoelectronic devices. Unfortunately, traditional ab initio methods are prohibitive for capturing the real-time carrier dynamics of such large systems due to their high computational cost. In this work, we present the first real-time electron-nuclear dynamics study of qHP-C60 using velocity-gauge density functional tight binding, which enables us to perform several picoseconds of excited-state electron-nuclear dynamics simulations for nanoscale systems with periodic boundary conditions. When applied to C60, qHP-C60, and their solvated counterparts, we demonstrate that water/moisture significantly increases the electron-hole recombination time in C60 but has little impact on qHP-C60. Our excited-state electron-nuclear dynamics calculations show that qHP-C60 is extremely unique and enable exploration of time-resolved dynamics for understanding excited-state processes of large systems in complex, solvated environments.

Cover page of Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.5

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.5

(2024)

This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is primarily responsible for implementing coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, teams and collective subroutines. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF subroutines. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler's own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.

Cover page of The Genome Architecture of the Copepod Eurytemora carolleeae - the Highly Invasive Atlantic Clade of the Eurytemoraaffinis Species Complex.

The Genome Architecture of the Copepod Eurytemora carolleeae - the Highly Invasive Atlantic Clade of the Eurytemoraaffinis Species Complex.

(2024)

Copepods are among the most abundant organisms on the planet and play critical functions in aquatic ecosystems. Among copepods, populations of the Eurytemora affinis species complex are numerically dominant in many coastal habitats and serve as food sources for major fisheries. Intriguingly, certain populations possess the unusual capacity to invade novel salinities on rapid time scales. Despite their ecological importance, high-quality genomic resources have been absent for calanoid copepods, limiting our ability to comprehensively dissect the genome architecture underlying the highly invasive and adaptive capacity of certain populations. Here, we present the first chromosome-level genome of a calanoid copepod, from the Atlantic clade (Eurytemora carolleeae) of the E. affinis species complex. This genome was assembled using high-coverage PacBio long-read and Hi-C sequences of an inbred line, generated through 30 generations of full-sib mating. This genome, consisting of 529.3 Mb (contig N50 = 4.2 Mb, scaffold N50 = 140.6 Mb), was anchored onto four chromosomes. Genome annotation predicted 20,262 protein-coding genes, of which ion transport-related gene families were substantially expanded based on comparative analyses of 12 additional arthropod genomes. Also, we found genome-wide signatures of historical gene body methylation of the ion transport-related genes and the significant clustering of these genes on each chromosome. This genome represents one of the most contiguous copepod genomes to date and is among the highest quality marine invertebrate genomes. As such, this genome provides an invaluable resource to help yield fundamental insights into the ability of this copepod to adapt to rapidly changing environments.

Cover page of QRCODE: Massively parallelized real-time time-dependent density functional theory for periodic systems

QRCODE: Massively parallelized real-time time-dependent density functional theory for periodic systems

(2024)

We present a new software module, QRCODE (Quantum Research for Calculating Optically Driven Excitations), for massively parallelized real-time time-dependent density functional theory (RT-TDDFT) calculations of periodic systems in the open-source Qbox software package. Our approach utilizes a custom implementation of a fast Fourier transformation scheme that significantly reduces inter-node message passing interface (MPI) communication of the major computational kernel and shows impressive scaling up to 16,344 CPU cores. In addition to improving computational performance, QRCODE contains a suite of various time propagators for accurate RT-TDDFT calculations. As benchmark applications of QRCODE, we calculate the current density and optical absorption spectra of hexagonal boron nitride (h-BN) and photo-driven reaction dynamics of the ozone-oxygen reaction. We also calculate the second and higher harmonic generation of monolayer and multi-layer boron nitride structures as examples of large material systems. Our optimized implementation of RT-TDDFT in QRCODE enables large-scale calculations of real-time electron dynamics of chemical and material systems with enhanced computational performance and impressive scaling across several thousand CPU cores.

Cover page of Roadmap on methods and software for electronic structure based simulations in chemistry and materials

Roadmap on methods and software for electronic structure based simulations in chemistry and materials

(2024)

This Roadmap article provides a succinct, comprehensive overview of the state of electronic structure (ES) methods and software for molecular and materials simulations. Seventeen distinct sections collect insights by 51 leading scientists in the field. Each contribution addresses the status of a particular area, as well as current challenges and anticipated future advances, with a particular eye towards software related aspects and providing key references for further reading. Foundational sections cover density functional theory and its implementation in real-world simulation frameworks, Green’s function based many-body perturbation theory, wave-function based and stochastic ES approaches, relativistic effects and semiempirical ES theory approaches. Subsequent sections cover nuclear quantum effects, real-time propagation of the ES, challenges for computational spectroscopy simulations, and exploration of complex potential energy surfaces. The final sections summarize practical aspects, including computational workflows for complex simulation tasks, the impact of current and future high-performance computing architectures, software engineering practices, education and training to maintain and broaden the community, as well as the status of and needs for ES based modeling from the vantage point of industry environments. Overall, the field of ES software and method development continues to unlock immense opportunities for future scientific discovery, based on the growing ability of computations to reveal complex phenomena, processes and properties that are determined by the make-up of matter at the atomic scale, with high precision.