Lee, Sunwoo; Kang, Qiao; Wang, Kewei; Balewski, Jan; Sim, Alex; Agrawal, Ankit; Choudhary, Alok; Nugent, Peter; Wu, Kesheng; Liao, Wei-keng

doi:10.1109/hipc53243.2021.00046

This item is not available for download from eScholarship

Asynchronous I/O Strategy for Large-Scale Deep Learning Applications

2021

Published Web Location

https://sdm.lbl.gov/oapapers/hipc2021-lee.pdf

No data is associated with this publication.

Creative Commons 'BY' version 4.0 license

Abstract

Many scientific applications have started using deep learning methods for their classification or regression problems. However, for data-intensive scientific applications, I/O performance can be the major performance bottleneck. In order to effectively solve important real-world problems using deep learning methods on High-Performance Computing (HPC) systems, it is essential to address the poor I/O performance issue in large-scale neural network training. In this paper, we propose an asynchronous I/O strategy that can be generally applied to deep learning applications. Our I/O strategy employs an I/O -dedicated thread per process, that performs I/O operations independently of the training progress. The I/O thread reads many training samples at once to reduce the total number of I/O operations per epoch. Given the fixed amount of training data, the fewer the I/O operations per epoch, the shorter the overall I/O time. The I/O operations are also overlapped with the computations using the double-buffering method. We evaluate our I/O strategy using two real-world scientific applications, CosmoFlow and Neuron-Inverter. Our experimental results demonstrate that the proposed I/O strategy significantly improves the scaling performance without affecting the regression performance.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?

Report a problem accessing this item

UC Berkeley

Asynchronous I/O Strategy for Large-Scale Deep Learning Applications

Published Web Location