Serverless computing is a new paradigm for developing cloud applications popularized by Amazon's AWS Lambda service. In serverless computing, applications are decomposed into fine-grained serverless functions developed in high-level languages such as Python and Javascript. Developers can then submit these functions to serverless providers which then deploy and execute them. Unlike VM-based platforms, in serverless computing applications run inside containerized execution environments called "lambdas" or "lambda functions". Due to the lightweight and fine-grained nature of lambdas, serverless computing can provide higher elasticity and higher resource utilization. Furthermore, in serverless computing infrastructure and operational aspects of running applications in the cloud are delegated to the cloud provider, and this relieves developers from many complex and onerous tasks. This paradigm shift towards serverless computing is poised to radically change the way developers build cloud applications.
We identify two main challenges with leveraging serverless computing for highly-distributed applications, such as Big Data Analytics and Machine Learning. The first challenge has to do with automatic management of resources through higher-level abstractions for serverless applications. The second has to do with the performance and scalability of distributed network communication on serverless platforms. In this thesis we present two systems that tackle both of these challenges. We solve the first one with a system, Cirrus, for automatic serverless ML end-to-end workflows. We solve the second one with Zip, a system that provides high-performance and scalable distributed primitives for inter-lambda serverless communication.
In this thesis we show that it is possible to provide simple APIs to developers with significantly better performance than today's approaches. For instance, Cirrus provides 2 orders of magnitude more updates per second in model training than when using PyWren, a MapReduce serverless framework, because it provides a high-level API backed by a highly optimized backend for ML tasks. Similarly, Zip provides 1.3-12x speedup for different communication patterns compared to the next best alternative, using a memory-backed store for inter-lambda communication.