Summary:
The ability to search over large scientific datasets has become crucial to next‐generation scientific discoveries as data generated from scientific facilities grow dramatically. In previous work, we developed and deployed ScienceSearch, a search infrastructure for scientific data which uses machine learning to automate metadata creation. Our current deployment is deployed atop a container based platform at a HPC center. In this article, we present an evaluation and discuss our experiences with the ScienceSearch infrastructure. Specifically, we present a performance evaluation of ScienceSearch's infrastructure focusing on scalability trends. The obtained results show that ScienceSearch is able to serve up to 130 queries/min with latency under 3 s. We discuss our infrastructure setup and evaluation results to provide our experiences and a perspective on opportunities and challenges of our search infrastructure.