Kernel-Bypass Data Planes

Building kernel-bypass, zero-copy, and shared-memory I/O systems for NFV, middleboxes, and microservices

Modern cloud and network infrastructures increasingly rely on software-based packet processing. Traditional network appliances such as firewalls, NATs, caches, proxies, load balancers, and cellular core functions are now implemented as virtualized network functions, middleboxes, and cloud-native microservices running on commodity servers. This shift improves flexibility, programmability, and deployment agility, but it also exposes a fundamental performance challenge: the conventional kernel networking stack and loosely coupled service chains introduce significant overhead from data copies, interrupts, context switches, protocol processing, serialization/deserialization, and repeated packet traversal across software layers.

This project explores how to design high-performance I/O systems for network function virtualization (NFV), middleboxes, and cloud-native microservices. The goal is to combine the performance benefits of kernel-bypass networking, zero-copy shared memory, and hardware-aware packet processing with the usability and flexibility of modern software systems.

Research Overview

A central theme of this project is that different network-resident functions have different I/O requirements. Layer-2/Layer-3 network functions, such as forwarding, routing, and filtering, often need to operate as transparent “bump-in-the-wire” functions. They require high throughput, low latency, and predictable overload behavior. Kernel-bypass frameworks such as DPDK are well suited for this case because they avoid kernel protocol processing and use polling-based packet I/O to achieve line-rate performance.

Layer-4/Layer-7 middleboxes, such as HTTP proxies, caches, and load balancers, have a different requirement. They need a full TCP/IP protocol stack and often perform application-level processing. A robust kernel protocol stack is attractive for correctness and functionality, but repeatedly traversing the kernel stack across a chain of middlebox functions is expensive.

My work in this area studies this tension and develops systems that selectively combine different I/O mechanisms:

  • MiddleNet (Zeng et al., 2022; Qi et al., 2023) provides a unified NFV and middlebox framework. It uses DPDK-based kernel bypass for L2/L3 NFV functions and eBPF-based event-driven shared-memory communication for L4/L7 middlebox chains. It further uses SR-IOV to let both paths coexist on the same physical host and to dynamically direct flows to the right processing layer.

  • X-IO (Qi et al., 2023) generalizes shared-memory communication into a high-performance I/O interface for cloud-native microservices. It supports both zero-copy APIs and POSIX-like socket APIs over lock-free shared memory, enabling both asynchronous and synchronous communication while reducing kernel networking overhead.

  • Z-stack (Narappa et al., 2024) extends this line of work to TCP/IP processing. It builds a userspace DPDK-based TCP/IP stack with zero-copy socket APIs, eliminating data copies between the application, protocol stack, and NIC. It also integrates naturally with shared-memory processing inside a node.

Together, these systems form a coherent research direction: building a high-performance I/O substrate that supports virtualized network functions, middleboxes, and microservices with low latency, high throughput, load-proportional resource usage, and practical programmability.

Key Ideas

Kernel Bypass for High-Performance L2/L3 Packet Processing

L2/L3 network functions require fast packet forwarding and predictable performance under high load. For these functions, kernel-bypass packet I/O is essential. DPDK enables packets to be moved directly between the NIC and userspace shared memory, bypassing kernel protocol processing and avoiding interrupt-driven receive livelock.

In MiddleNet, we show that DPDK is a better fit for L2/L3 NFV because it can deliver near line-rate packet processing and lower latency for transparent packet-processing chains. Although polling consumes dedicated CPU cores, the performance and overload behavior make it attractive for high-rate packet processing.

Event-Driven Shared Memory for L4/L7 Middleboxes

L4/L7 middleboxes need protocol processing, so completely bypassing the kernel is often not the best choice. Instead, MiddleNet uses the kernel stack once for TCP/IP processing and then avoids repeated protocol processing inside the middlebox chain.

The key idea is to use shared memory for communication among middlebox functions after protocol processing has completed. MiddleNet leverages eBPF SKMSG and socket maps to pass descriptors between functions in an event-driven manner. This makes the overhead load-proportional: when traffic is low, middlebox functions do not waste CPU cycles polling; when traffic increases, batching helps amortize interrupt and context-switch overhead.

Unified NFV and Middlebox Support

Traditional NFV and middlebox systems are often deployed on different platforms. MiddleNet unifies them by using SR-IOV and flow bifurcation. Packets that require L2/L3 processing are steered to the DPDK-based NFV path, while packets that require L4/L7 processing are steered to the kernel/eBPF-based middlebox path.

This design allows a single physical server to support both high-speed packet forwarding and full-stack middlebox processing. It also allows flows to be dynamically processed at different layers depending on their needs.

Lock-Free Shared-Memory I/O

X-IO focuses on the communication substrate itself. Many cloud-native systems need both asynchronous communication, such as service chains or DAGs, and synchronous communication, such as RPC or HTTP request-response patterns. Existing systems usually implement both over kernel networking, which incurs unnecessary overhead when services are co-located on the same node.

X-IO builds a unified I/O framework over shared memory and lock-free producer/consumer rings. It provides a raw zero-copy interface for high-performance data exchange and a POSIX-like socket interface for easier application porting. By maintaining connection state and supporting concurrent connections, X-IO can support realistic microservice workloads, including 5G control-plane communication.

Zero-Copy Userspace TCP/IP

Shared memory is highly effective for intra-node communication, but distributed systems still need efficient inter-node TCP/IP communication. Z-stack addresses this problem by building a userspace TCP/IP stack on top of DPDK and FreeBSD’s proven TCP/IP implementation.

Unlike existing userspace TCP/IP stacks that still copy data between the application and protocol stack, Z-stack introduces zero-copy socket APIs. Applications pass data buffers directly to the stack, and the stack prepends TCP/IP headers without copying the payload. This reduces latency and improves throughput, especially for large messages.

References

2024

  1. LANMAN 24
    Z-Stack: A High-Performance DPDK-Based Zero-Copy TCP/IP Protocol Stack
    Anvaya B. Narappa, Federico Parola, Shixiong Qi, and K. K. Ramakrishnan

2023

  1. IEEE TNSM
    MiddleNet: A Unified, High-Performance NFV and Middlebox Framework With eBPF and DPDK
    Shixiong Qi, Ziteng Zeng, Leslie Monis, and K. K. Ramakrishnan
  2. NetSoft 23
    X-IO: A High-performance Unified I/O Interface using Lock-free Shared Memory Processing
    Shixiong Qi, Han-Sing Tsai, Yu-Sheng Liu, K. K. Ramakrishnan, and Jyh-Cheng Chen

2022

  1. NetSoft 22
    MiddleNet: A High-Performance, Lightweight, Unified NFV and Middlebox Framework
    Ziteng Zeng, Leslie Monis, Shixiong Qi, and K. K. Ramakrishnan