Modeling Data Transfers: Change Point and Anomaly Detection
Published Web Location
https://sdm.lbl.gov/oapapers/snta18-dao.pdfAbstract
To help the operations and resource planning of a large experimental facility, we model the time needed for transferring the data files produced by the facility to a computer center, with the goals of predicting expected file transfer time and identifying unusually slow transfers that might require attention from human operators. The file transfer time can be thought of having two parts: a base time depending on the hardware and software involved, and a congestion part due to uncontrollable interferences from other operations on the shared resources including network links, disk storage systems, and CPU involved in the transfers. Since many parameters important to the transfer time are not available to us, we employ a change point detection algorithm to separate the data records into time periods (called segments) with relatively stable behavior. Within each segment, we apply a non-parametric model to describe the congestion time. When the observed file transfer time is significantly longer than typical expected time, we declare the particular file transfer to be unusually slow. When many of these unusually slow file transfers are observed, it is worthwhile to notify the human operators to investigate the abnormal behavior of the system.