Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of Unlikelihood of a phonon mechanism for the high-temperature superconductivity in La3Ni2O7

Unlikelihood of a phonon mechanism for the high-temperature superconductivity in La3Ni2O7

(2025)

The discovery of ~80 K superconductivity in nickelate La3Ni2O7 under pressure has ignited intense interest. Here, we present a comprehensive first-principles study of the electron-phonon (e-ph) coupling in La3Ni2O7 and its implications on the observed superconductivity. Our results conclude that the e-ph coupling is too weak (with a coupling constant λ ≲ 0.5) to account for the high Tc, albeit interesting many-electron correlation effects exist. While Coulomb interactions (via GW self-energy and Hubbard U) enhance the e-ph coupling strength, electron doping (oxygen vacancies) introduces no major changes. Additionally, different structural phases display varying characteristics near the Fermi level, but do not alter the conclusion. The e-ph coupling landscape of La3Ni2O7 is intrinsically different from that of infinite-layer nickelates. These findings suggest that a phonon-mediated mechanism is unlikely to be responsible for the observed superconductivity in La3Ni2O7, pointing instead to an unconventional nature.

Cover page of Artificial intelligence driven laser parameter search: Inverse design of photonic surfaces using greedy surrogate-based optimization

Artificial intelligence driven laser parameter search: Inverse design of photonic surfaces using greedy surrogate-based optimization

(2025)

Photonic surfaces designed with specific optical characteristics are becoming increasingly crucial for novel energy harvesting and storage systems. The design of these surfaces can be achieved by texturing materials using lasers. The optimal adjustment of laser fabrication parameters to achieve target surface optical properties is an open challenge. Thus, we develop a surrogate-based optimization approach. Our framework employs the Random Forest algorithm to model the forward relationship between the laser fabrication parameters and the resulting optical characteristics. During the optimization process, we use a greedy, prediction-based exploration strategy that iteratively selects batches of laser parameters to be used in experimentation by minimizing the predicted discrepancy between the surrogate model's outputs and the user-defined target optical characteristics. This strategy allows for efficient identification of optimal fabrication parameters without the need to model the error landscape directly. We demonstrate the efficiency and effectiveness of our approach on two synthetic benchmarks and two specific experimental applications of photonic surface inverse design targets. By calculating the average performance of our algorithm compared to other state of the art optimization methods, we show that our algorithm performs, on average, twice as well across all benchmarks. Additionally, a warm starting inverse design technique for changed target optical characteristics enhances the performance of the introduced approach.

Cover page of VAN-DAMME: GPU-accelerated and symmetry-assisted quantum optimal control of multi-qubit systems

VAN-DAMME: GPU-accelerated and symmetry-assisted quantum optimal control of multi-qubit systems

(2025)

We present an open-source software package, VAN-DAMME (Versatile Approaches to Numerically Design, Accelerate, and Manipulate Magnetic Excitations), for massively-parallelized quantum optimal control (QOC) calculations of multi-qubit systems. To enable large QOC calculations, the VAN-DAMME software package utilizes symmetry-based techniques with custom GPU-enhanced algorithms. This combined approach allows for the simultaneous computation of hundreds of matrix exponential propagators that efficiently leverage the intra-GPU parallelism found in high-performance GPUs. In addition, to maximize the computational efficiency of the VAN-DAMME code, we carried out several extensive tests on data layout, computational complexity, memory requirements, and performance. These extensive analyses allowed us to develop computationally efficient approaches for evaluating complex-valued matrix exponential propagators based on Padé approximants. To assess the computational performance of our GPU-accelerated VAN-DAMME code, we carried out QOC calculations of systems containing 10 - 15 qubits, which showed that our GPU implementation is 18.4× faster than the corresponding CPU implementation. Our GPU-accelerated enhancements allow efficient calculations of multi-qubit systems, which can be used for the efficient implementation of QOC applications across multiple domains. Program summary: Program Title: VAN-DAMME CPC Library link to program files:: https://doi.org/10.17632/zcgw2n5bjf.1 Licensing provisions: GNU General Public License 3 Programming language: C++ and CUDA Nature of problem: The VAN-DAMME software package utilizes GPU-accelerated routines and new algorithmic improvements to compute optimized time-dependent magnetic fields that can drive a system from a known initial qubit configuration to a specified target state with a large (≈1) transition probability. Solution method: Quantum control, GPU acceleration, analytic gradients, matrix exponential, and gradient ascent optimization.

Cover page of Probing Rotational Decoherence with a Trapped-Ion Planar Rotor

Probing Rotational Decoherence with a Trapped-Ion Planar Rotor

(2025)

The quantum rotor is one of the simplest model systems in quantum mechanics, but only in recent years has theoretical work revealed general fundamental scaling laws for its decoherence. For example, a superposition of orientations decoheres at a rate proportional to the sine squared of the angle between them. Here, we observe scaling laws for rotational decoherence dynamics for the first time, using a 4  μm diameter planar rotor composed of two Paul-trapped ions. We prepare the rotational motion of the ion crystal into superpositions of angular momentum with well-defined differences ranging from 1-3ℏ, and measure the rate of decoherence. We also tune the system-environment interaction strength by introducing resonant electric field noise. The observed scaling relationships for decoherence are in excellent agreement with recent theoretical work, and are directly relevant to the growing development of rotor-based quantum applications.

Cover page of A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

(2025)

We present the GPU implementation efforts and challenges of the sparse solver package STRUMPACK. The code is made publicly available on github with a permissive BSD license. STRUMPACK implements an approximate multifrontal solver, a sparse LU factorization which makes use of compression methods to accelerate time to solution and reduce memory usage. Multiple compression schemes based on rank-structured and hierarchical matrix approximations are supported, including hierarchically semi-separable, hierarchically off-diagonal butterfly, and block low rank. In this paper, we present the GPU implementation of the block low rank (BLR) compression method within a multifrontal solver. Our GPU implementation relies on highly optimized vendor libraries such as cuBLAS and cuSOLVER for NVIDIA GPUs, rocBLAS and rocSOLVER for AMD GPUs and the Intel oneAPI Math Kernel Library (oneMKL) for Intel GPUs. Additionally, we rely on external open source libraries such as SLATE (Software for Linear Algebra Targeting Exascale), MAGMA (Matrix Algebra on GPU and Multi-core Architectures), and KBLAS (KAUST BLAS). SLATE is used as a GPU-capable ScaLAPACK replacement. From MAGMA we use variable sized batched dense linear algebra operations such as GEMM, TRSM and LU with partial pivoting. KBLAS provides efficient (batched) low rank matrix compression for NVIDIA GPUs using an adaptive randomized sampling scheme. The resulting sparse solver and preconditioner runs on NVIDIA, AMD and Intel GPUs. Interfaces are available from PETSc, Trilinos and MFEM, or the solver can be used directly in user code. We report results for a range of benchmark applications, using the Perlmutter system from NERSC, Frontier from ORNL, and Aurora from ALCF. For a high frequency wave equation on a regular mesh, using 32 Perlmutter compute nodes, the factorization phase of the exact GPU solver is about 6.5× faster compared to the CPU-only solver. The BLR-enabled GPU solver is about 13.8× faster than the CPU exact solver. For a collection of SuiteSparse matrices, the STRUMPACK exact factorization on a single GPU is on average 1.9× faster than NVIDIA’s cuDSS solver.

Cover page of Inverse design of photonic surfaces via multi fidelity ensemble framework and femtosecond laser processing

Inverse design of photonic surfaces via multi fidelity ensemble framework and femtosecond laser processing

(2025)

We demonstrate a multi-fidelity (MF) machine learning ensemble framework for the inverse design of photonic surfaces, trained on a dataset of 11,759 samples that we fabricate using high throughput femtosecond laser processing. The MF ensemble combines an initial low fidelity model for generating design solutions, with a high fidelity model that refines these solutions through local optimization. The combined MF ensemble can generate multiple disparate sets of laser-processing parameters that can each produce the same target input spectral emissivity with high accuracy (root mean squared errors < 2%). SHapley Additive exPlanations analysis shows transparent model interpretability of the complex relationship between laser parameters and spectral emissivity. Finally, the MF ensemble is experimentally validated by fabricating and evaluating photonic surface designs that it generates for improved efficiency energy harvesting devices. Our approach provides a powerful tool for advancing the inverse design of photonic surfaces in energy harvesting applications.

Cover page of PETSc/TAO developments for GPU-based early exascale systems

PETSc/TAO developments for GPU-based early exascale systems

(2025)

The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy’s Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well as the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries.

Cover page of Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

(2024)

GPUs are used in many settings to accelerate large-scale scientific computation, including simulation, computational biology, and molecular dynamics. However, optimizing codes to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This paper shows that an automated GPU program optimization tool, GEVO, can leverage evolutionary computation to find code edits that reduce the runtime of three important applications, multiple sequence alignment, agent-based simulation and molecular dynamics codes, by 28.9%, 29%, and 17.8% respectively. The paper presents an in-depth analysis of the discovered optimizations, revealing that (1) several of the most important optimizations involve significant epistasis, (2) the primary sources of improvement are application-specific, and (3) many of the optimizations generalize across GPU architectures. In general, the discovered optimizations are not straightforward even for a GPU human expert, showcasing the potential of automated program optimization tools to both reduce the optimization burden for human domain experts and provide new insights for GPU experts.