Evaluation of a scientific data search infrastructure
Skip to main content
eScholarship
Open Access Publications from the University of California

Evaluation of a scientific data search infrastructure

Published Web Location

https://doi.org/10.1002/cpe.7261Creative Commons 'BY' version 4.0 license
Abstract

Summary: The ability to search over large scientific datasets has become crucial to next‐generation scientific discoveries as data generated from scientific facilities grow dramatically. In previous work, we developed and deployed ScienceSearch, a search infrastructure for scientific data which uses machine learning to automate metadata creation. Our current deployment is deployed atop a container based platform at a HPC center. In this article, we present an evaluation and discuss our experiences with the ScienceSearch infrastructure. Specifically, we present a performance evaluation of ScienceSearch's infrastructure focusing on scalability trends. The obtained results show that ScienceSearch is able to serve up to 130 queries/min with latency under 3 s. We discuss our infrastructure setup and evaluation results to provide our experiences and a perspective on opportunities and challenges of our search infrastructure.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View