Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

The Management of Context in the Machine Learning Lifecycle

Abstract

We present novel techniques and systems for managing data context within the machine learning (ML) lifecycle. Drawing from a vision laid out in 2018, we present Flor and its evolutions, FlorDB and FlorDB with Build extensions, designed for comprehensive metadata capture and version control in the ML lifecycle. A cornerstone of our approach is the use of an interview study to understand what the ML lifecycle is, and how engineers operationalize machine learning, focusing on MLOps and the iterative model development process. Through the implementation of these systems and their use in real-world applications for lawyers and journalists, we demonstrate the tangible benefits of rich data context in agile model development. In sum, we show how the integration of Application, Build, and Change contexts—The ABCs of Context—enables MLEs to close the loop in the ML lifecycle.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View