High-Performance Serverless Computing

Re-architecting serverless platforms with lightweight data planes, elastic control planes, and secure runtimes

Serverless computing promises an event-driven and resource-efficient way to build cloud applications, where developers only write application logic while the platform dynamically manages execution, scaling, and resource allocation. However, existing serverless platforms often inherit heavyweight components from container orchestration systems, including kernel-based networking, sidecar proxies, message brokers, and reactive autoscalers. These components introduce substantial overheads in function chains, increase tail latency, waste CPU resources, and make it difficult to support latency-sensitive and resource-constrained environments such as edge clouds and large-scale distributed ML workloads.

This project explores how to redesign serverless systems across both the data plane and the control plane. The central goal is to make serverless computing truly lightweight, high-performance, elastic, and secure, while preserving its programmability and operational benefits.

Research Overview

My work on serverless computing started from understanding the networking overheads in Kubernetes-based cloud platforms (Qi et al., 2020; Qi et al., 2021). We first characterized the performance of different Container Network Interface (CNI) plugins, showing how network models, overlay tunneling, iptables rules, eBPF processing, and host networking stack interactions affect throughput, latency, scalability, and pod startup latency. This measurement-driven study provided a foundation for understanding why existing cloud-native datapaths can become a major bottleneck for microservices and serverless workloads.

Building on this understanding, we designed a series of serverless platforms that optimize different aspects of the system stack:

Mu (Mittal et al., 2021) focuses on the serverless control plane for resource-constrained edge clouds. It integrates SLO-aware autoscaling, load-aware request routing, and fairness-aware placement into Knative. Mu uses lightweight workload prediction and piggybacked runtime metrics to proactively scale functions, improve tail latency, reduce resource consumption, and ensure fairness under limited edge resources.
SPRIGHT (Qi et al., 2022; Qi et al., 2024) focuses on the serverless data plane for function chains. It replaces heavyweight sidecar proxies and repeated kernel networking with an event-driven, eBPF-based shared-memory datapath. SPRIGHT enables direct function routing, zero-copy communication within a function chain, lightweight protocol adaptation, and load-proportional resource usage. This design significantly improves throughput and latency while reducing CPU consumption compared with Knative.
LIFL (Qi et al., 2024) extends these ideas to federated learning aggregation, where model updates are large, clients are dynamic, and aggregation must be both elastic and efficient. LIFL uses shared memory, eBPF-based sidecars, in-place message queuing, locality-aware placement, hierarchy-aware autoscaling, and aggregator reuse to support scalable serverless FL aggregation with lower CPU cost and faster time-to-accuracy.
SURE (Parola et al., 2024) revisits serverless runtime design through unikernels. It combines fast-startup unikernel-based function execution with a secure high-performance datapath. SURE uses distributed zero-copy communication, a library-based sidecar, a zero-copy TCP/IP stack, and MPK-based memory protection to provide both efficiency and isolation. This work explores how serverless platforms can achieve rapid startup, high throughput, low latency, and stronger isolation at the same time.

Together, these systems form a coherent research direction: rearchitecting serverless platforms by removing unnecessary kernel, networking, sidecar, and orchestration overheads, while adding principled support for elasticity, locality, fairness, and isolation.

Key Ideas

Lightweight and Load-Proportional Data Planes

A recurring theme in this project is that serverless datapaths should consume resources only when useful work arrives. Existing platforms often rely on always-running sidecars, message brokers, and kernel networking paths. In contrast, our designs use shared memory, eBPF, event-driven processing, and library-based sidecars to make communication between functions more direct and efficient.

This enables serverless function chains to avoid repeated protocol processing, serialization/deserialization, context switches, interrupts, and data copies.

Control Planes for Elasticity, Locality, and Fairness

Serverless platforms must make fast and accurate control decisions: how many function instances to run, where to place them, and how to route traffic. This is especially important in edge clouds and distributed ML workloads, where resources are limited and demand changes over time.

Our work designs control-plane mechanisms that are aware of SLOs, workload dynamics, resource heterogeneity, function-chain structure, communication locality, and fairness among competing functions.

Secure High-Performance Serverless Runtime

High performance alone is insufficient for multi-tenant serverless clouds. SURE explores how to combine zero-copy shared memory and unikernel-based execution with fine-grained memory protection. By using MPK-based call gates and protecting trusted runtime components, SURE shows how serverless systems can provide both efficient communication and stronger isolation.

References

2024

IEEE/ACM ToN

SPRIGHT: High-Performance eBPF-Based Event-Driven, Shared-Memory Processing for Serverless Computing

Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-Chin Wang, and K. K. Ramakrishnan

DOI Code
MLSys 24

LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning

Shixiong Qi, K. K. Ramakrishnan, and Myungjin Lee

PDF Code
ACM SoCC 24

SURE: Secure Unikernels Make Serverless Computing Rapid and Efficient

Federico Parola, Shixiong Qi, Anvaya B. Narappa, K. K. Ramakrishnan, and Fulvio Risso

Abs DOI Code

Current serverless platforms introduce non-trivial overheads when chaining and orchestrating loosely-coupled microservices. Containerized function runtimes are also constrained by insufficient isolation and excessive startup time. This motivates our exploration of a more efficient, secure, and rapid serverless design. We describe SURE, a unikernel-based serverless framework for fast function startup, equipped with a high-performance and secure data plane. SURE’s data plane supports distributed zero-copy communication via the seamless interaction between zero-copy protocol stack (Z-stack) and local shared memory processing. To establish a lightweight service mesh, SURE uses library-based sidecars instead of individual userspace sidecars. We leverage Intel’s Memory Protection Keys (MPK) as a lightweight capability to ensure safe access to the shared memory data plane. It also isolates the Trusted Computing Base (TCB) components in SURE’s function runtime (e.g., library-based sidecar, scheduler, etc) from untrusted user code, while preserving the efficient single-address-space nature of unikernels. In particular, SURE prevents unintended privilege escalation involving MPK with an enhanced TCB. These combined efforts create a more secure and robust data plane while improving throughput up to 79X over Knative, a representative open-source serverless platform.

2022

SIGCOMM 22

SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing

Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, and K. K. Ramakrishnan

Abs DOI Code

Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. ’Cold-start’ latency is another deterrent.We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF’s socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10\texttimes less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions ’warm’ with negligible penalty.Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for ’cold-start’.

2021

IEEE TNSM

Assessing Container Network Interface Plugins: Functionality, Performance, and Scalability

Shixiong Qi, Sameer G. Kulkarni, and K. K. Ramakrishnan

DOI
ACM SoCC 21

Mu: An Efficient, Fair and Responsive Serverless Framework for Resource-Constrained Edge Clouds

Viyom Mittal, Shixiong Qi, Ratnadeep Bhattacharya, Xiaosu Lyu, Junfeng Li, Sameer G. Kulkarni, Dan Li, Jinho Hwang, K. K. Ramakrishnan, and Timothy Wood

Abs DOI

Serverless computing platforms simplify development, deployment, and automated management of modular software functions. However, existing serverless platforms typically assume an over-provisioned cloud, making them a poor fit for Edge Computing environments where resources are scarce. In this paper we propose a redesigned serverless platform that comprehensively tackles the key challenges for serverless functions in a resource constrained Edge Cloud.Our Mu platform cleanly integrates the core resource management components of a serverless platform: autoscaling, load balancing, and placement. Each worker node in Mu transparently propagates metrics such as service rate and queue length in response headers, feeding this information to the load balancing system so that it can better route requests, and to our autoscaler to anticipate workload fluctuations and proactively meet SLOs. Data from the Autoscaler is then used by the placement engine to account for heterogeneity and fairness across competing functions, ensuring overall resource efficiency, and minimizing resource fragmentation. We implement our design as a set of extensions to the Knative serverless platform and demonstrate its improvements in terms of resource efficiency, fairness, and response time.Evaluating Mu, shows that it improves fairness by more than 2x over the default Kubernetes placement engine, improves 99th percentile response times by 62% through better load balancing, reduces SLO violations and resource consumption by pro-active and precise autoscaling. Mu reduces the average number of pods required by more than 15% for a set of real Azure workloads.

2020

LANMAN 20

Understanding Container Network Interface Plugins: Design Considerations and Performance

Shixiong Qi, Sameer G Kulkarni, and K. K. Ramakrishnan

DOI