Flash-based storage drives such as solid-state disks are replacing traditional spinning disk drives for an increasing number of applications. User interfacing cloud-based applications benefit from the low, sub-millisecond access latency of solid-state drives (SSDs). Virtually all smartphones are using flash memory as their storage media due to features such as low power consumption, larger storage density, small footprint, and shock resistance. SSDs provide faster boot times, higher read and write bandwidth as well as improved durability. Nevertheless, flash-based storage devices show several disadvantages. Technology scaling, 3D integration as well as multi-level bit cells have continuously increased storage density and capacity, however, this has also reduced the reliability of flash. Flash memory also suffers from overheads such as garbage collection, which can reduce write bandwidth and introduce high tail latency. Furthermore, while NAND flash devices provide significantly lower latency than spinning disks, flash has still orders of magnitude higher latency than DRAM.
This work leverages machine learning techniques to improve the performance of flash-based storage systems. This improvement reflects in three major directions - improving response time, reliability, and lifetime of flash-based storage devices. For improving response time, we leverage sequence-to-sequence machine learning techniques to learn the spatial IO access patterns thereby improving prefetching performance. To achieve high performance, we address the challenges of prefetching in very large sparse address spaces, as well as prefetching in a timely manner by predicting ahead of time. To improve reliability, we propose an approach of automatically predicting and interpreting future drive failures. Finally, we present a machine learning based approach for reducing the number of rewrites required to store data in log-structured file systems via death-time prediction of logical block addresses. We leverage the predicted death-times in designing \sysML, a near-optimal data placement technique that minimizes the number of extra writes required to store data in log-structured storage systems thereby improving device lifetime.