Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns
Published Web Location
https://sdm.lbl.gov/oapapers/dcc2017-lee-summary.pdfAbstract
We propose a new class of lossy compression based on locally exchangeable measure that captures the distribution of repeating data blocks while preserving unique patterns. The technique has been demonstrated to reduce data volume by more than 100-fold on power grid monitoring data where a large number of data blocks can be characterized as following stationary probability distributions. To capture data with more diverse patterns, we propose two techniques to transform non-stationary time series into locally stationary blocks. We also propose a strategy to work with values in bounded ranges such as phase angles of alternating current. These new ideas are incorporated into a software package named IDEALEM. In experiments, IDEALEM reduces non-stationary data volume up to 100-fold. Compared with the state-of-The-Art lossy compression methods such as SZ, IDEALEM can produce more compact output overall.