Search

Scholarly Works (7 results)

Sort By:

Thesis
Peer Reviewed

Advanced Database Techniques for Processing Scientific Multi-Dimensional Data

Zhao, Weijie
Advisor(s): Rusu, Florin

UC Merced Electronic Theses and Dissertations (2018)

Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks. Traditional databases are not equipped with the adequate functionality to handle the volume and variety of ``Big Data''. Scientific data have dual structure. Raw data are preponderantly ordered multi-dimensional arrays or sequences while metadata and derived data are best represented as unordered relations. Scientific data processing requires complex operations over arrays and relations. These operations cannot be expressed using only standard linear and relational algebra operators, respectively.

Cover page: Advanced Database Techniques for Processing Scientific Multi-Dimensional Data

Creative Commons 'BY-ND' version 4.0 license

Article
Peer Reviewed

Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE

UC Berkeley Previously Published Works (2018)

Article
Peer Reviewed

Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE

UC Merced Previously Published Works (2018)

Palomar Transient Factory (PTF) is a comprehensive detection system for the identification and classification of transient astrophysical objects. In this paper, we make two significant contributions to the PTF pipeline. First, we present an experimental study that evaluates a novel implementation of the real-time classifier in GLADE - a parallel data processing system that combines the efficiency of a database with the extensibility of map-reduce. We show how each stage in the classifier maps optimally into GLADE tasks by taking advantage of the unique features of the system - range-based data partitioning, columnar storage, multi-query execution, and in-database support for complex aggregate computation. Second, we introduce a novel parallel similarity join algorithm for advanced transient classification. We implement this algorithm in GLADE and execute it on a massive supercomputer with more than 3,000 threads, achieving more than three orders of magnitude improvement over the PostgreSQL solution.

Article
Peer Reviewed

Distributed Caching for Complex Querying of Raw Arrays

UC Berkeley Previously Published Works (2018)

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time.

Cover page: Distributed Caching for Complex Querying of Raw Arrays

Article
Peer Reviewed

Machine learning applied to retrieval of temperature and concentration distributions from infrared emission measurements

UC Merced Previously Published Works (2019)

Inversion of temperature and species concentration distributions from radiometric measurements involves solving nonlinear, ill-posed and high-dimensional problems. Machine Learning approaches allow solving such highly nonlinear problems, offering an alternative way to deal with complex and dynamic systems with good flexibility. In this study, we present a machine learning approach for retrieving temperatures and species concentrations from spectral infrared emission measurements in combustion systems. The training spectra for the machine learning model were synthesized through calculations from HITEMP 2010 for gas mixtures of CO2, H2O, and CO. The method was tested for different line-of-sight temperature and concentration distributions, different gas path lengths and different spectral intervals. Experimental validation was carried out by measuring spectral emission from a Hencken flat flame burner with a Fourier-transform infrared spectrometer with different spectral resolutions. The temperature fields above the burner for combustion with equivalence ratios of ϕ = 1, ϕ = 0.8, and ϕ = 1.4 were retrieved and were in excellent agreement with temperatures deduced from Rayleigh scattering thermometry.

Cover page: Machine learning applied to retrieval of temperature and concentration distributions from infrared emission measurements

Article
Peer Reviewed

Reconfiguring crystal and electronic structures of MoS2 by substitutional doping

UC Berkeley Previously Published Works (2018)

Doping of traditional semiconductors has enabled technological applications in modern electronics by tailoring their chemical, optical and electronic properties. However, substitutional doping in two-dimensional semiconductors is at a comparatively early stage, and the resultant effects are less explored. In this work, we report unusual effects of degenerate doping with Nb on structural, electronic and optical characteristics of MoS₂ crystals. The doping readily induces a structural transformation from naturally occurring 2H stacking to 3R stacking. Electronically, a strong interaction of the Nb impurity states with the host valence bands drastically and nonlinearly modifies the electronic band structure with the valence band maximum of multilayer MoS₂ at the Γ point pushed upward by hybridization with the Nb states. When thinned down to monolayers, in stark contrast, such significant nonlinear effect vanishes, instead resulting in strong and broadband photoluminescence via the formation of exciton complexes tightly bound to neutral acceptors.

Cover page: Reconfiguring crystal and electronic structures of MoS2 by substitutional doping

Article
Peer Reviewed

iPTF Archival Search for Fast Optical Transients

UC Merced Previously Published Works (2018)

There has been speculation about a class of relativistic explosions with an initial Lorentz factor Γinit smaller than that of classical gamma-ray bursts (GRBs). These "dirty fireballs" would lack prompt GRB emission but could be pursued via their optical afterglow, appearing as transients that fade overnight. Here we report a search for such transients (that fade by 5-σ in magnitude overnight) in four years of archival photometric data from the intermediate Palomar Transient Factory (iPTF). Our search criteria yielded 50 candidates. Of these, two were afterglows to GRBs that had been found in dedicated follow-up observations to triggers from the Fermi GRB Monitor. Another (iPTF14yb) was a GRB afterglow discovered serendipitously. Eight were spurious artifacts of reference image subtraction, and one was an asteroid. The remaining 38 candidates have red stellar counterparts in external catalogs. The photometric and spectroscopic properties of the counterparts identify these transients as strong flares from M dwarfs of spectral type M3-M7 at distances of d ≈ 0.15-2.1 kpc; three counterparts were already spectroscopically classified as late-type M stars. With iPTF14yb as the only confirmed relativistic outflow discovered independently of a high-energy trigger, we constrain the all-sky rate of transients that peak at m = 18 and fade by Δm = 2 mag in Δt = 3 hr to be , with a 68% confidence interval of . This implies that the rate of visible dirty fireballs is at most comparable to that of the known population of long-duration GRBs.