Performance testing is a standard practice for evolving systems to detect performance issues proactively. It samples various performance metrics that will be compared with a stable baseline to judge whether the measurement data is abnormal. This type of comparative analysis requires domain expertise, which can take experienced performance analysts days to conduct. In an effort to build an automatic solution for a leading data warehousing company to improve the comparative performance analysis efficiency, we implemented machine learning approaches proposed by existing research. But the initial result has a 86% false negative rate on average, which means the majority of performance defects would be missed. To investigate causes for this unsatisfying result, we take a step back to revisit the performance data itself and find several important data related issues that are overlooked by existing work. In this paper, we discuss in detail these issues and share our hindsights to address them. With the new learning scheme we devise, we are able to reduce the false negative rate to as low as 16% and achieve a balanced accuracy of 0.91, which enables the analysis engine to be practically adopted.
Pre-2018 CSE ID: CS2015-1014