An increasing number of workloads are moving to cloud data centers, including large-scale machine learning, big data analytics and back-ends for the Internet of Things. Many of these workloads are written in managed languages such as Java, Python or Scala. The performance and efficiency of managed-language workloads are therefore crucial in terms of hardware cost, energy efficiency and quality of service for these data centers.
While managed-language issues such as garbage collection (GC) and JIT compilation have seen a significant amount of research on single-node deployments, data center workloads run across a large number of independent language virtual machines and face new systems challenges that were not previously addressed. At the same time, there has been a large amount of work on specialized systems software and custom hardware for data centers, but most of this work does not fundamentally address managed languages and does not modify the language runtime system, effectively treating it as a black box.
In this thesis, we argue that we can substantially improve the performance, efficiency and responsiveness of managed applications in cloud data centers by treating the language runtime system as a fundamental part of the data center stack and co-designing it with both the software systems layer and the hardware layer. In particular, we argue that the cloud operators' full control over the software and hardware stack enables them to co-design these different layers to a degree that would be difficult to achieve in other settings. To support this thesis, we investigate two examples of co-designing the language runtime system with the remainder of the stack, spanning both the hardware and software layers.
On the software side, we show how to better support distributed managed-language applications through a "Holistic" Language Runtime System, which treats the runtimes underpinning a distributed application as a distributed system itself. We first introduce the concept of a Holistic Runtime System. We then present Taurus, a prototype implementation of such a system, based on the OpenJDK Hotspot JVM. By applying Taurus to two representative real-world workloads, we show that it is effective both in reducing the overall runtime and resource consumption, as well as improving long tail-latencies.
On the hardware side, we describe how custom data center SoCs provide an opportunity to revisit the old idea of hardware support for garbage collection. We first show that garbage collection is a suitable workload to be offloaded from the CPU to data-parallel accelerators, by demonstrating how integrated GPUs can be used to perform garbage collection for applications running on the CPU. We then generalize these ideas into a custom hardware accelerator for garbage collection that performs GC more efficiently than running the operation on a traditional CPU. We show this design in the context of a stop-the-world garbage collector, and describe how it could be extended to a fully concurrent, pause-free GC.
Finally, we discuss how hardware-software research on managed languages requires new research infrastructure to achieve a higher degree of realism and industry adoption. We then present the foundation of a new research platform for this type of work, using open-source hardware based on the free and open RISC-V ISA combined with the Jikes Research Virtual Machine. Using this research infrastructure, we evaluate the performance and efficiency of our proposed hardware-assisted garbage collector design.