Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

An Extensible Architecture for Distributed Heterogeneous Processing

Abstract

The exponential growth in artificial intelligence (AI) compute demands has significantly outpaced the advancement in single-node processing power. This widening gap has made distributed heterogeneous processing essential for modern AI applications. However, existing distributed data processing systems struggle to effectively handle the complexities of heterogeneous execution.

This dissertation argues for extensibility as the key principle in designing systems for distributed heterogeneous processing. We build two libraries on top of Ray, a distributed execution system. First, we develop the streaming batch model, which enables efficient heterogeneous execution and dynamic adaptability to varying workloads. Second, we introduce Exoshuffle, a distributed shuffle library that enables flexible control of data semantics without sacrificing performance, demonstrating that complex data operations can be implemented efficiently as application libraries rather than requiring purpose-built systems. Both libraries are integrated into the opne-source framework Ray Data, which has been adopted by thousands of companies in the industry. Finally, we validate our the effectiveness of this architecture through the CloudSort benchmark, in which Exoshuffle-CloudSort set a new world record for the most cost-effective sorting of data on a public cloud. These results demonstrate that this extensible architecture can deliver both high performance and scalability while providing the flexibility required for heterogeneous workloads. This work provides a foundation for building efficient distributed heterogeneous processing systems capable of meeting the continuously growing computational demands of AI applications.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View