Loading…
Venue: Independence Ballroom clear filter
Monday, April 28
 

9:10am EDT

Enabling Silent Telemetry Data Transmission with InvisiFlow
Monday April 28, 2025 9:10am - 9:30am EDT
Yinda Zhang, University of Pennsylvania; Liangcheng Yu, University of Pennsylvania and Microsoft Research; Gianni Antichi, Politecnico di Milano and Queen Mary University of London; Ran Ben Basat, University College London; Vincent Liu, University of Pennsylvania


Network applications from traffic engineering to path tracing often rely on the ability to transmit fine-grained telemetry data from network devices to a set of collectors. Unfortunately, prior work has observed—and we validate—that existing transmission methods for such data can result in significant overhead to user traffic and/or loss of telemetry data, particularly when the network is heavily loaded.

In this paper, we introduce InvisiFlow, a novel communication substrate to collect network telemetry data, silently. In contrast to previous systems that always push telemetry packets to collectors based on the shortest path, InvisiFlow dynamically seeks out spare network capacity by leveraging opportunistic sending and congestion gradients, thus minimizing both the loss rate of telemetry data and overheads on user traffic. In a FatTree topology, InvisiFlow can achieve near-zero loss rate even under high-load scenarios (around 33.8× lower loss compared to the state-of-the-art transmission methods used by systems like Everflow and Planck).


https://www.usenix.org/conference/nsdi25/presentation/zhang-yinda
Monday April 28, 2025 9:10am - 9:30am EDT
Independence Ballroom

9:30am EDT

Unlocking ECMP Programmability for Precise Traffic Control
Monday April 28, 2025 9:30am - 9:50am EDT
Yadong Liu, Tencent; Yunming Xiao, University of Michigan; Xuan Zhang, Weizhen Dang, Huihui Liu, Xiang Li, and Zekun He, Tencent; Jilong Wang, Tsinghua University; Aleksandar Kuzmanovic, Northwestern University; Ang Chen, University of Michigan; Congcong Miao, Tencent


ECMP (equal-cost multi-path) has become a fundamental mechanism in data centers, which distributes flows along multiple equivalent paths based on their hash values. Randomized distribution optimizes for the aggregate case, spreading load across flows over time. However, there exists a class of important Precise Traffic Control (PTC) tasks that are at odds with ECMP randomness. For instance, if an end host perceives that its flows are traversing a problematic switch/link, it often needs to change their paths before a fix can be rolled out. With randomized hashing, existing solutions resort to modifying flow tuples; since hashing mechanisms are unknown and they vary across switches/vendors, it may take many trials before yielding a new path. Many other similar cases exist where precise and timely response is critical to the network.


We propose programmable ECMP (P-ECMP), a programming model, compiler, and runtime that provides precise traffic control. P-ECMP leverages an oft-ignored feature, ECMP groups, which allows for a constrained set of capabilities that are nonetheless sufficiently expressive for our tasks. An operator supplies high-level descriptions of their topology and policies, and our compiler generates PTC configurations for each switch. End hosts can reconfigure specific flows to use different PTC policies precisely and quickly, addressing a range of important use cases. We have evaluated P-ECMP using simulation at scale, and deployed one use case to a real-world data center that serves live user traffic.


https://www.usenix.org/conference/nsdi25/presentation/liu-yadong
Monday April 28, 2025 9:30am - 9:50am EDT
Independence Ballroom

9:50am EDT

Enabling Portable and High-Performance SmartNIC Programs with Alkali
Monday April 28, 2025 9:50am - 10:10am EDT
Jiaxin Lin, UT Austin; Zhiyuan Guo, UCSD; Mihir Shah, NVIDIA; Tao Ji, Microsoft; Yiying Zhang, UCSD; Daehyeok Kim and Aditya Akella, UT Austin


Trends indicate that emerging SmartNICs, either from different vendors or generations from the same vendor, exhibit substantial differences in hardware parallelism and memory interconnects. These variations make porting programs across NICs highly complex and time-consuming, requiring programmers to significantly refactor code for performance based on each target NIC’s hardware characteristics.


We argue that an ideal SmartNIC compilation framework should allow developers to write target-independent programs, with the compiler automatically managing cross-NIC porting and performance optimization. We present such a framework, Alkali, that achieves this by (1) proposing a new intermediate representation for building flexible compiler infrastructure for multiple NIC targets and (2) developing a new iterative parallelism optimization algorithm that automatically ports and parallelizes the input programs based on the target NIC’s hardware characteristics.


Experiments across a wide range of NIC applications demonstrate that Alkali enables developers to easily write portable, high-performance NIC programs. Our compiler optimization passes can automatically port these programs and make them run efficiently across all targets, achieving performance within 9.8% of hand-tuned expert implementations.


https://www.usenix.org/conference/nsdi25/presentation/lin-jiaxin
Monday April 28, 2025 9:50am - 10:10am EDT
Independence Ballroom

10:10am EDT

Scaling IP Lookup to Large Databases using the CRAM Lens
Monday April 28, 2025 10:10am - 10:30am EDT
Robert Chang and Pradeep Dogga, University of California, Los Angeles; Andy Fingerhut, Cisco Systems; Victor Rios and George Varghese, University of California, Los Angeles


Wide-area scaling trends require new approaches to Internet Protocol (IP) lookup, enabled by modern networking chips such as Intel Tofino, AMD Pensando, and Nvidia BlueField, which provide substantial ternary content-addressable memory (TCAM) and static random-access memory (SRAM). However, designing and evaluating scalable algorithms for these chips is challenging due to hardware-level constraints. To address this, we introduce the CRAM (CAM+RAM) lens, a framework that combines a formal model for evaluating algorithms on modern network processors with a set of optimization idioms. We demonstrate the effectiveness of CRAM by designing and evaluating three new IP lookup schemes: RESAIL, BSIC, and MashUp. RESAIL enables Tofino-2 to scale to 2.25 million IPv4 prefixes—likely sufficient for the next decade—while a pure TCAM approach supports only 250k prefixes, just 27% of the current global IPv4 routing table. Similarly, BSIC scales to 390k IPv6 prefixes on Tofino-2, supporting 3.2 times as many prefixes as a pure TCAM implementation. In contrast, existing state-of-the-art algorithms, SAIL for IPv4 and Hi-BST for IPv6, scale to considerably smaller sizes on Tofino-2.


https://www.usenix.org/conference/nsdi25/presentation/chang
Monday April 28, 2025 10:10am - 10:30am EDT
Independence Ballroom

11:00am EDT

On Temporal Verification of Stateful P4 Programs
Monday April 28, 2025 11:00am - 11:20am EDT
Delong Zhang, Chong Ye, and Fei He, School of Software, BNRist, Tsinghua University, Beijing 100084, China; Key Laboratory for Information System Security, MoE, China


Stateful P4 programs offload network states from the control plane to the data plane, enabling unprecedented network programmability. However, existing P4 verifiers overapproximate the stateful nature of P4 programs and are inherently inadequate for verifying network functions that require stateful decision-making.

To overcome this limitation, this paper introduces an innovative approach to verify P4 programs while accounting for their stateful feature. We propose a specification language named P4LTL, tailored for describing temporal properties of stateful P4 programs at the packet processing level. Additionally, we introduce a novel concept called the Büchi transaction, representing the product of the P4 program and the P4LTL specification. The P4 program verification problem can be reduced to determining the existence of any fair and feasible trace within the Büchi transaction. To the best of our knowledge, our approach represents the first endeavor in temporal verification of stateful P4 programs at the packet processing level. We implemented a prototype tool called p4tv. Evaluation results demonstrate p4tv’s effectiveness and efficiency in temporal verification of stateful P4 programs.


https://www.usenix.org/conference/nsdi25/presentation/zhang-delong
Monday April 28, 2025 11:00am - 11:20am EDT
Independence Ballroom

11:20am EDT

NDD: A Decision Diagram for Network Verification
Monday April 28, 2025 11:20am - 11:40am EDT
Zechun Li, Peng Zhang, and Yichi Zhang, Xi'an Jiaotong University; Hongkun Yang, Google


State-of-the-art network verifiers extensively use Binary Decision Diagram (BDD) as the underlying data structure to represent the network state and equivalence classes. Despite its wide usage, we find BDD is not ideal for network verification: verifiers need to handle the low-level computation of equivalence classes, and still face scalability issues when the network state has a lot of bits.


To this end, this paper introduces Network Decision Diagram (NDD), a new decision diagram customized for network verification. In a nutshell, NDD wraps BDD with another layers of decision diagram, such that each NDD node represents a field of the network, and each edge is labeled with a BDD encoding the values of that field. We designed and implemented a library for NDD, which features a native support for equivalence classes, and higher efficiency in memory and computation. Using the NDD library, we re-implemented five BDD-based verifiers with minor modifications to their original codes, and observed a 100× reduction in memory cost and 100× speedup. This indicates that NDD provides a drop-in replacement of BDD for network verifiers.


https://www.usenix.org/conference/nsdi25/presentation/li-zechun
Monday April 28, 2025 11:20am - 11:40am EDT
Independence Ballroom

11:40am EDT

Smart Casual Verification of the Confidential Consortium Framework
Monday April 28, 2025 11:40am - 12:00pm EDT
Heidi Howard, Markus A. Kuppe, Edward Ashton, and Amaury Chamayou, Azure Research, Microsoft; Natacha Crooks, Azure Research, Microsoft and UC Berkeley


The Confidential Consortium Framework (CCF) is an open-source platform for developing trustworthy and reliable cloud applications. CCF powers Microsoft's Azure Confidential Ledger service and as such it is vital to build confidence in the correctness of CCF's design and implementation. This paper reports our experiences applying smart casual verification to validate the correctness of CCF's novel distributed protocols, focusing on its unique distributed consensus protocol and its custom client consistency model. We use the term smart casual verification to describe our hybrid approach, which combines the rigor of formal specification and model checking with the pragmatism of automated testing, in our case binding the formal specification in TLA+ to the C++ implementation. While traditional formal methods approaches require substantial buy-in and are often one-off efforts by domain experts, we have integrated our smart casual verification approach into CCF's CI pipeline, allowing contributors to continuously validate CCF as it evolves. We describe the challenges we faced in applying smart casual verification to a complex existing codebase and how we overcame them to find six subtle bugs in the design and implementation before they could impact production.


https://www.usenix.org/conference/nsdi25/presentation/howard
Monday April 28, 2025 11:40am - 12:00pm EDT
Independence Ballroom

12:00pm EDT

VEP: A Two-stage Verification Toolchain for Full eBPF Programmability
Monday April 28, 2025 12:00pm - 12:20pm EDT
Xiwei Wu, Yueyang Feng, Tianyi Huang, Xiaoyang Lu, Shengkai Lin, Lihan Xie, Shizhen Zhao, and Qinxiang Cao, Shanghai Jiao Tong University


Extended Berkely Package Filter (eBPF) is a revolutionary technology that can safely and efficiently extend kernel capabilities. It has been widely used in networking, tracing, security, and more. However, existing eBPF verifiers impose strict constraints, often requiring repeated modifications to eBPF programs to pass verification. To enhance programmability, we introduce VEP, an annotation-guided eBPF program verification toolchain. VEP consists of three components: VEP-C, a verifier for annotated eBPF-C programs; VEP-compiler, a compiler targeting annotated eBPF bytecode; and VEP-eBPF, a lightweight bytecode-level proof checker. VEP allows users to verify the correctness of their programs with appropriate annotations, thus enabling full programmability. Our experimental results demonstrate that VEP addresses the limitations of existing verifiers, i.e. the Linux verifier and PREVAIL, and provides a more flexible and automated approach to kernel security.


https://www.usenix.org/conference/nsdi25/presentation/wu-xiwei
Monday April 28, 2025 12:00pm - 12:20pm EDT
Independence Ballroom

2:00pm EDT

Pyrrha: Congestion-Root-Based Flow Control to Eliminate Head-of-Line Blocking in Datacenter
Monday April 28, 2025 2:00pm - 2:20pm EDT
Kexin Liu, Zhaochen Zhang, Chang Liu, and Yizhi Wang, Nanjing University; Vamsi Addanki and Stefan Schmid, TU Berlin; Qingyue Wang, Wei Chen, Xiaoliang Wang, and Jiaqi Zheng, Nanjing University; Wenhao Sun, Tao Wu, Ke Meng, Fei Chen, Weiguang Wang, and Bingyang Liu, Huawei, China; Wanchun Dou, Guihai Chen, and Chen Tian, Nanjing University


In modern datacenters, the effectiveness of end-to-end congestion control (CC) is quickly diminishing with the rapid bandwidth evolution. Per-hop flow control (FC) can react to congestion more promptly. However, a coarse-grained FC can result in Head-Of-Line (HOL) blocking. A fine-grained, per-flow FC can eliminate HOL blocking caused by flow control, however, it does not scale well. This paper presents Pyrrha, a scalable flow control approach that provably eliminates HOL blocking while using a minimum number of queues. In Pyrrha, flow control first takes effect on the root of the congestion, i.e., the port where congestion occurs. And then flows are controlled according to their contributed congestion roots. A prototype of Pyrrha is implemented on Tofino2 switches. Compared with state-of-the-art approaches, the average FCT of uncongested flows is reduced by 42%-98%, and 99th-tail latency can be 1.6×-215× lower, without compromising the performance of congested flows.


https://www.usenix.org/conference/nsdi25/presentation/liu-kexin
Monday April 28, 2025 2:00pm - 2:20pm EDT
Independence Ballroom

2:20pm EDT

eTran: Extensible Kernel Transport with eBPF
Monday April 28, 2025 2:20pm - 2:40pm EDT
Zhongjie Chen, Tsinghua University; Qingkai Meng, Nanjing University; ChonLam Lao, Harvard University; Yifan Liu and Fengyuan Ren, Tsinghua University; Minlan Yu, Harvard University; Yang Zhou, UC Berkeley and UC Davis


Evolving datacenters with diverse application demands are driving network transport designs. However, few have successfully landed in the widely-used kernel networking stack to benefit broader users, and they take multiple years. We present eTran, a system that makes kernel transport extensible to implement and customize diverse transport designs agilely. To achieve this, eTran leverages and extends eBPF-based techniques to customize the kernel to support complex transport functionalities safely. Meanwhile, eTran carefully absorbs user-space transport techniques for performance gain without sacrificing robust protection. We implement TCP (with DCTCP congestion control) and Homa under eTran, and achieve up to 4.8×/1.8× higher throughput with 3.7×/7.5× lower latency compared to existing kernel implementation.


https://www.usenix.org/conference/nsdi25/presentation/chen-zhongjie
Monday April 28, 2025 2:20pm - 2:40pm EDT
Independence Ballroom

2:40pm EDT

White-Boxing RDMA with Packet-Granular Software Control
Monday April 28, 2025 2:40pm - 3:00pm EDT
Chenxingyu Zhao and Jaehong Min, University of Washington; Ming Liu, University of Wisconsin-Madison; Arvind Krishnamurthy, University of Washington


Driven by diverse workloads and deployments, numerous innovations emerge to customize RDMA transport, spanning congestion control, multi-tenant isolation, routing, and more. However, RDMA's hardware-offloading nature poses significant rigidity when landing these innovations. Prior workflows to deliver customizations have either waited for lengthy hardware iterations, developed bespoke hardware, or applied coarse-grained control over the black-box RDMA NIC. Despite considerable efforts, current customization workflows still lack flexibility, raw performance, and broad availability.

In this work, we advocate for White-Boxing RDMA, which provides control of the hardware transport to general-purpose software while preserving raw data path performance. To facilitate the white-boxing methodology, we design and implement Software-Controlled RDMA (SCR), a framework enabling packet-granular software control over the hardware transport. To address challenges stemming from granular control over high-speed line rates, SCR employs effective control models, boosts the efficiency of subsystems within the framework, and leverages emerging hardware capabilities. We implement SCR on the latest Nvidia BlueField-3 equipped with Datapath Accelerators, delivering a spectrum of new customizations not present in legacy RDMA transport, such as Multi-Tenant Fair Scheduler, User-Defined Congestion Control, Receiver-Driven Flow Control, and Multi-Path Routing Selection. Furthermore, we demonstrate SCR's applicability for GPU-Direct and NVMe-oF RDMA with zero modifications to machine learning or storage code.


https://www.usenix.org/conference/nsdi25/presentation/zhao-chenxingyu
Monday April 28, 2025 2:40pm - 3:00pm EDT
Independence Ballroom

3:00pm EDT

SIRD: A Sender-Informed, Receiver-Driven Datacenter Transport Protocol
Monday April 28, 2025 3:00pm - 3:20pm EDT
Konstantinos Prasopoulos, EPFL; Ryan Kosta, UCSD; Edouard Bugnion, EPFL; Marios Kogias, Imperial College London


Datacenter congestion control protocols are challenged to navigate the throughput-buffering trade-off while relative packet buffer capacity is trending lower year-over-year. In this context, receiver-driven protocols — which schedule packet transmissions instead of reacting to congestion — excel when the bottleneck lies at the ToR-to-receiver link. However, when multiple receivers must use a shared link (e.g., ToR to Spine), their independent schedules can conflict.

We present SIRD, a receiver-driven congestion control protocol designed around the simple insight that single-owner links should be scheduled, while shared links should be managed with reactive control algorithms. The approach allows receivers to both precisely schedule their downlinks and to coordinate over shared bottlenecks. Critically, SIRD also treats sender uplinks as shared links, enabling the flow of congestion feedback from senders to receivers, which then adapt their scheduling to each sender’s real time capacity. This results in tight scheduling, enabling high bandwidth utilization with little contention, and thus minimal latency-inducing buffering in the fabric.

We implement SIRD on top of the Caladan stack and show that SIRD’s asymmetric design can deliver 100Gbps in software while keeping network queuing minimal. We further compare SIRD to state-of-the-art receiver-driven protocols (Homa, dcPIM, and ExpressPass) and production-grade reactive protocols (Swift and DCTCP) and show that SIRD is uniquely able to simultaneously maximize link utilization, minimize queuing, and obtain near-optimal latency.


https://www.usenix.org/conference/nsdi25/presentation/prasopoulos
Monday April 28, 2025 3:00pm - 3:20pm EDT
Independence Ballroom

3:50pm EDT

Mowgli: Passively Learned Rate Control for Real-Time Video
Monday April 28, 2025 3:50pm - 4:10pm EDT
Neil Agarwal and Rui Pan, Princeton University; Francis Y. Yan, University of Illinois Urbana-Champaign; Ravi Netravali, Princeton University


Rate control algorithms are at the heart of video conferencing platforms, determining target bitrates that match dynamic network characteristics for high quality. Despite the promise that recent data-driven strategies have shown for this challenging task, the performance degradation that they introduce during training has been a nonstarter for many production services, precluding adoption. This paper aims to bolster the practicality of data-driven rate control by presenting an alternate avenue for experiential learning: using purely existing telemetry logs that we surprisingly observe embed performant decisions but often at the wrong times or in the wrong order. To realize this approach despite the inherent uncertainty that log-based learning brings (i.e., lack of feedback for new decisions), our system, Mowgli, combines a variety of robust learning techniques (i.e., conservatively reasoning about alternate behavior to minimize risk and using a richer model formulation to account for environmental noise). Across diverse networks (emulated and real-world), Mowgli outperforms the widely deployed GCC algorithm, increasing average video bitrates by 15–39% while reducing freeze rates by 60–100%.


https://www.usenix.org/conference/nsdi25/presentation/agarwal
Monday April 28, 2025 3:50pm - 4:10pm EDT
Independence Ballroom

4:10pm EDT

Dissecting and Streamlining the Interactive Loop of Mobile Cloud Gaming
Monday April 28, 2025 4:10pm - 4:30pm EDT
Yang Li, Jiaxing Qiu, Hongyi Wang, and Zhenhua Li, Tsinghua University; Feng Qian, University of Southern California; Jing Yang, Tsinghua University; Hao Lin, Tsinghua University and University of Illinois Urbana-Champaign; Yunhao Liu, Tsinghua University; Bo Xiao and Xiaokang Qin, Ant Group; Tianyin Xu, University of Illinois Urbana-Champaign


With cloud-side computing and rendering, mobile cloud gaming (MCG) is expected to deliver high-quality gaming experiences to budget mobile devices. However, our measurement on representative MCG platforms reveals that even under good network conditions, all platforms exhibit high interactive latency of 112–403 ms, from a user-input action to its display response, that critically affects users’ quality of experience. Moreover, jitters in network latency often lead to significant fluctuations in interactive latency.

In this work, we collaborate with a commercial MCG platform to conduct the first in-depth analysis on the interactive latency of cloud gaming. We identify VSync, the synchronization primitive of Android graphics pipeline, to be a key contributor to the excessive interactive latency; as many as five VSync events are intricately invoked, which serialize the complex graphics processing logic on both the client and cloud sides. To address this, we design an end-to-end VSync regulator, dubbed LoopTailor, which minimizes VSync events by decoupling game rendering from the lengthy cloud-side graphics pipeline and coordinating cloud game rendering directly with the client. We implement LoopTailor on the collaborated platform and commodity Android devices, reducing the interactive latency (by ∼34%) to stably below 100 ms.


https://www.usenix.org/conference/nsdi25/presentation/li-yang
Monday April 28, 2025 4:10pm - 4:30pm EDT
Independence Ballroom

4:30pm EDT

Region-based Content Enhancement for Efficient Video Analytics at the Edge
Monday April 28, 2025 4:30pm - 4:50pm EDT
Weijun Wang, Institute for AI Industry Research (AIR), Tsinghua University; Liang Mi, Shaowei Cen, and Haipeng Dai, State Key Laboratory for Novel Software Technology, Nanjing University; Yuanchun Li, Institute for AI Industry Research (AIR), Tsinghua University; Xiaoming Fu, University of Göttingen; Yunxin Liu, Institute for AI Industry Research (AIR), Tsinghua University


Video analytics is widespread in various applications serving our society. Recent advances of content enhancement in video analytics offer significant benefits for the bandwidth saving and accuracy improvement. However, existing content-enhanced video analytics systems are excessively computationally expensive and provide extremely low throughput. In this paper, we present region-based content enhancement, that enhances only the important regions in videos, to improve analytical accuracy. Our system, RegenHance, enables high-accuracy and high-throughput video analytics at the edge by 1) a macroblock-based region importance predictor that identifies the important regions fast and precisely, 2) a regionaware enhancer that stitches sparsely distributed regions into dense tensors and enhances them efficiently, and 3) a profile-based execution planer that allocates appropriate resources for enhancement and analytics components. We prototype RegenHance on five heterogeneous edge devices. Experiments on two analytical tasks reveal that region-based enhancement improves the overall accuracy of 10-19% and achieves 2-3× throughput compared to the state-of-the-art frame-based enhancement methods.


https://www.usenix.org/conference/nsdi25/presentation/wang-weijun
Monday April 28, 2025 4:30pm - 4:50pm EDT
Independence Ballroom

4:50pm EDT

Tooth: Toward Optimal Balance of Video QoE and Redundancy Cost by Fine-Grained FEC in Cloud Gaming Streaming
Monday April 28, 2025 4:50pm - 5:10pm EDT
Congkai An, Huanhuan Zhang, Shibo Wang, Jingyang Kang, Anfu Zhou, Liang Liu, and Huadong Ma, Beijing University of Posts and Telecommunications; Zili Meng, Hong Kong University of Science and Technology; Delei Ma, Yusheng Dong, and Xiaogang Lei, Well-Link Times Inc.


Despite the rapid rise of cloud gaming, real-world evaluations of its quality of experience (QoE) remain scarce. To fill this gap, we conduct a large-scale measurement campaign, analyzing over 60,000 sessions on an operational cloud gaming platform. We find that current cloud gaming streaming suffers from substantial bandwidth wastage and severe interaction stalls simultaneously. In-depth investigation reveals the underlying reason, i.e., existing streaming adopts coarse-grained Forward Error Correction (FEC) encoding, without considering the adverse impact of frame length variation, which results in over-protection of large frames (i.e., bandwidth waste) and under-protection of smaller ones (i.e., interaction stalls). To remedy the problem, we propose Tooth, a per-frame adaptive FEC that aims to achieve the optimal balance between satisfactory QoE and efficient bandwidth usage. To build Tooth, we design a dual-module FEC encoding strategy, which takes full consideration of both frame length variation and network dynamics, and hence determines an appropriate FEC redundancy rate for each frame. Moreover, we also circumvent the formidable per-frame FEC computational overhead by designing a lightweight Tooth, so as to meet the rigid latency bound of real-time cloud gaming. We implement, deploy, and evaluate Tooth in the operational cloud gaming system. Extensive field tests demonstrate that Tooth significantly outperforms existing state-of-the-art FEC methods, reducing stall rates by 40.2% to 85.2%, enhancing video bitrates by 11.4% to 29.2%, and lowering bandwidth costs by 54.9% to 75.0%.


https://www.usenix.org/conference/nsdi25/presentation/an
Monday April 28, 2025 4:50pm - 5:10pm EDT
Independence Ballroom

5:10pm EDT

AsTree: An Audio Subscription Architecture Enabling Massive-Scale Multi-Party Conferencing
Monday April 28, 2025 5:10pm - 5:30pm EDT
Tong Meng, Wenfeng Li, Chao Yuan, Changqing Yan, and Le Zhang, ByteDance Inc.


While operating a multi-party video conferencing system (Lark) globally, we find that audio subscription alone may pose considerable challenges to the network, especially when scaling towards massive scales. Traditional strategy of subscribing to all remote participants suffers from issues such as signaling storm, excessive bandwidth and resource consumption on both server and client sides. Aimed at enhanced scalability, we share our design of AsTree, an audio subscription architecture. By a cascading tree topology and media plane-based audio selection, AsTree dramatically reduces the number of signaling messages and audio streams to forward. Practical deployment in Lark reduces audio and video stall ratios by more than 30% and 50%. We also receive 40% less negative client reviews, strongly proving the value of AsTree.


https://www.usenix.org/conference/nsdi25/presentation/meng
Monday April 28, 2025 5:10pm - 5:30pm EDT
Independence Ballroom
 
Tuesday, April 29
 

9:00am EDT

Pineapple: Unifying Multi-Paxos and Atomic Shared Registers
Tuesday April 29, 2025 9:00am - 9:20am EDT
Tigran Bantikyan, Northwestern; Jonathan Zarnstorff, unaffiliated; Te-Yen Chou, CMU; Lewis Tseng, UMass Lowell; Roberto Palmieri, Lehigh University


Linearizable storage systems reduce the complexity of developing correct large-scale customer-facing applications, in the presence of concurrent operations and failures. A common approach for providing linearizability is to use consensus to order operations invoked by applications. This paper explores designs that offload operations (from the consensus component) to improve overall performance.

This paper presents Pineapple, which uses logical timestamps to unify Multi-Paxos and atomic shared registers so that any node in the system can serve read and write operations. Compared to Multi-Paxos (or leader-based consensus), Pineapple reduces bottlenecks at the leader. Compared to Gryff, which unifies EPaxos and atomic shared registers, Pineapple has better performance because Pineapple has “non-blocking operation execution.”

Our evaluation shows that Pineapple improves both throughput and tail latency, compared to state-of-the-art systems (e.g., Gryff, Multi-Paxos, EPaxos), in both wide-area networks and local-area networks. We also integrate Pineapple with etcd. In a balanced workload, Pineapple reduces median latency by more than 50%, compared to the original system that uses an optimized version of Raft.


https://www.usenix.org/conference/nsdi25/presentation/bantikyan
Tuesday April 29, 2025 9:00am - 9:20am EDT
Independence Ballroom

9:20am EDT

Ladder: A Convergence-based Structured DAG Blockchain for High Throughput and Low Latency
Tuesday April 29, 2025 9:20am - 9:40am EDT
Dengcheng Hu, Jianrong Wang, Xiulong Liu, and Hao Xu, Tianjin University; Xujing Wu, Jd.Com, Inc; Muhammad Shahzad, North Carolina State University; Guyue Liu, Peking University; Keqiu Li, Tianjin University


Recent literature proposes the use of Directed Acyclic Graphs (DAG) to enhance blockchain performance. However, current block-DAG designs face three important limitations when fully utilizing parallel block processing: high computational overhead due to costly block sorting, complex transaction confirmation process, and vulnerability to balance attacks when determining the pivot chain. To this end, we propose Ladder, a structured twin-chain DAG blockchain with a convergence mechanism that efficiently optimizes parallel block processing strategy and enhances overall performance and security. In each round, a designated convergence node generates a lower-chain block, sorting the forked blocks from the upper-chain, reducing computational overhead and simplifying transaction confirmation.To counter potential adversarial disruptions, a dynamic committee is selected to generate special blocks when faulty blocks are detected. We implemented and evaluated Ladder in a distributed network environment against several state-of-the-art methods. Our results show that Ladder achieves a 59.6% increase in throughput and a 20.9% reduction in latency.


https://www.usenix.org/conference/nsdi25/presentation/hu
Tuesday April 29, 2025 9:20am - 9:40am EDT
Independence Ballroom

9:40am EDT

Vegeta: Enabling Parallel Smart Contract Execution in Leaderless Blockchains
Tuesday April 29, 2025 9:40am - 10:00am EDT
Tianjing Xu and Yongqi Zhong, Shanghai Jiao Tong University; Yiming Zhang, Shanghai Jiao Tong University and Shanghai Key Laboratory of Trusted Data Circulation, Governance and Web3; Ruofan Xiong, Xiamen University; Jingjing Zhang, Fudan University; Guangtao Xue and Shengyun Liu, Shanghai Jiao Tong University and Shanghai Key Laboratory of Trusted Data Circulation, Governance and Web3


Consensus and smart contract execution play complementary roles in blockchain systems. Leaderless consensus, as a promising direction in the blockchain context, can better utilize the resources of each node and/or avoid incurring the extra burden of timing assumptions. As modern Byzantine-Fault Tolerant (BFT) consensus protocols can order several hundred thousand transactions per second, contract execution is becoming the performance bottleneck. Adding concurrency to contract execution is a natural way to boost its performance, but none of the existing frameworks is a perfect fit for leaderless consensus.

We propose speculate-order-replay, a generic framework tailored to leaderless consensus protocols. Our framework allows each proposer to (pre-)process transactions prior to consensus, better utilizing its computing resources. We instantiate the framework with a concrete concurrency control protocol Vegeta. Vegeta speculatively executes a series of transactions and analyzes their dependencies before consensus, and later deterministically replays the schedule. We ran experiments under the real-world Ethereum workload on 16vCPU virtual machines. Our evaluation results show that Vegeta achieved up to 7.8× speedup compared to serial execution. When deployed on top of a leaderless consensus protocol with 10 nodes, Vegeta still achieved 6.9× speedup.


https://www.usenix.org/conference/nsdi25/presentation/xu-tianjing
Tuesday April 29, 2025 9:40am - 10:00am EDT
Independence Ballroom

10:00am EDT

Shoal++: High Throughput DAG BFT Can Be Fast and Robust!
Tuesday April 29, 2025 10:00am - 10:20am EDT
Balaji Arun and Zekun Li, Aptos Labs; Florian Suri-Payer, Cornell University; Sourav Das, UIUC; Alexander Spiegelman, Aptos Labs


Today's practical partially synchronous Byzantine Fault Tolerant consensus protocols trade off low latency and high throughput. On the one end, traditional BFT protocols such as PBFT and its derivatives optimize for latency. They require, in fault-free executions, only 3 message delays to commit, the optimum for BFT consensus. However, this class of protocols typically relies on a single leader, hampering throughput scalability. On the other end, a new class of so-called DAG-BFT protocols demonstrates how to achieve highly scalable throughput by separating data dissemination from consensus, and using every replica as proposer. Unfortunately, existing DAG-BFT protocols pay a steep latency premium, requiring on average 10.5 message delays to commit transactions.

This work aims to soften this tension, and proposes Shoal++, a novel DAG-based BFT consensus system that offers the throughput of DAGs while reducing end-to-end consensus commit latency to an average of 4.5 message delays. Our empirical findings are encouraging, showing that Shoal++ achieves throughput comparable to state-of-the-art DAG BFT solutions while reducing latency by up to 60%, even under less favorable network and failure conditions.


https://www.usenix.org/conference/nsdi25/presentation/arun
Tuesday April 29, 2025 10:00am - 10:20am EDT
Independence Ballroom

10:50am EDT

HA/TCP: A Reliable and Scalable Framework for TCP Network Functions
Tuesday April 29, 2025 10:50am - 11:10am EDT
Haoyu Gu, Ali José Mashtizadeh, and Bernard Wong, University of Waterloo


Layer 7 network functions (NFs) are a critical piece of modern network infrastructure. As a result, the scalability and reliability of these NFs are important but challenging because of the complexity of layer 7 NFs. This paper presents HA/TCP, a framework that enables migration and failover of layer 7 NFs. HA/TCP uses a novel replication mechanism to synchronize the state between replicas with low overhead, enabling seamless migration and failover of TCP connections. HA/TCP encapsulates the implementation details into our replicated socket interface to allow developers to easily add high availability to their layer 7 NFs such as WAN accelerators, load balancers, and proxies. Our benchmarks show that HA/TCP provides reliability for a 100 Gbps NF with as little as 0.2% decrease in client throughput. HA/TCP transparently migrates a connection between replicas in 38 µs, including the network latency. We provide reliability to a SOCKS proxy and a WAN accelerator with less than 2% decrease in throughput and a modest increase in CPU usage.


https://www.usenix.org/conference/nsdi25/presentation/gu
Tuesday April 29, 2025 10:50am - 11:10am EDT
Independence Ballroom

11:10am EDT

High-level Programming for Application Networks
Tuesday April 29, 2025 11:10am - 11:30am EDT
Xiangfeng Zhu, Yuyao Wang, Banruo Liu, Yongtong Wu, and Nikola Bojanic, University of Washington; Jingrong Chen, Duke University; Gilbert Louis Bernstein and Arvind Krishnamurthy, University of Washington; Sam Kumar, University of Washington and UCLA; Ratul Mahajan, University of Washington; Danyang Zhuo, Duke University


Application networks facilitate communication between the microservices of cloud applications. They are built today using service meshes with low-level specifications that make it difficult to express application-specific functionality (e.g., access control based on RPC fields), and they can more than double the RPC latency. We develop AppNet, a framework that makes it easy to build expressive and high-performance application networks. Developers specify rich RPC processing in a high-level language with generalized match-action rules and built-in state management. We compile the specifications to high-performance code after optimizing where (e.g., client, server) and how (e.g., RPC library, proxy) each RPC processing element runs. The optimization uses symbolic abstraction and execution to judge if different runtime configurations of possibly-stateful RPC processing elements are semantically equivalent for arbitrary RPC streams. Our experiments show that AppNet can express common application network function in only 7-28 lines of code. Its optimizations lower RPC processing latency by up to 82%.


https://www.usenix.org/conference/nsdi25/presentation/zhu
Tuesday April 29, 2025 11:10am - 11:30am EDT
Independence Ballroom

11:30am EDT

State-Compute Replication: Parallelizing High-Speed Stateful Packet Processing
Tuesday April 29, 2025 11:30am - 11:50am EDT
Qiongwen Xu, Rutgers University; Sebastiano Miano, Politecnico di Milano; Xiangyu Gao and Tao Wang, New York University; Adithya Murugadass and Songyuan Zhang, Rutgers University; Anirudh Sivaraman, New York University; Gianni Antichi, Queen Mary University of London and Politecnico di Milano; Srinivas Narayana, Rutgers University


With the slowdown of Moore’s law, CPU-oriented packet processing in software will be significantly outpaced by emerging line speeds of network interface cards (NICs). Single-core packet-processing throughput has saturated.

We consider the problem of high-speed packet processing with multiple CPU cores. The key challenge is state—memory that multiple packets must read and update. The prevailing method to scale throughput with multiple cores involves state sharding, processing all packets that update the same state, e.g., flow, at the same core. However, given the skewed nature of realistic flow size distributions, this method is untenable, since total throughput is limited by single-core performance.

This paper introduces state-compute replication, a principle to scale the throughput of a single stateful flow across multiple cores using replication. Our design leverages a packet history sequencer running on a NIC or top-of-the-rack switch to enable multiple cores to update state without explicit synchronization. Our experiments with realistic data center and wide-area Internet traces show that state-compute replication can scale total packet-processing throughput linearly with cores, independent of flow size distributions, across a range of realistic packet-processing programs.


https://www.usenix.org/conference/nsdi25/presentation/xu-qiongwen
Tuesday April 29, 2025 11:30am - 11:50am EDT
Independence Ballroom

11:50am EDT

MTP: Transport for In-Network Computing
Tuesday April 29, 2025 11:50am - 12:10pm EDT
Tao Ji, UT Austin; Rohan Vardekar and Balajee Vamanan, University of Illinois Chicago; Brent E. Stephens, Google and University of Utah; Aditya Akella, UT Austin


In-network computing (INC) is being increasingly adopted to accelerate applications by offloading part of the applications’ computation to network devices. Such application-specific (L7) offloads have several attributes that the transport protocol must work with — they may mutate, intercept, reorder and
delay application messages that span multiple packets. At the same time the transport must also work with the buffering and computation constraints of network devices hosting the L7 offloads. Existing transports and alternative approaches fall short in these regards. Therefore, we present MTP, the first transport to natively support INC. MTP is built around two major components: 1) a novel message-oriented reliability protocol and 2) a resource-specific congestion control framework. We implement a full-fledged prototype of MTP based on DPDK. We show the efficacy of MTP in a testbed with a real INC application as well as with comprehensive microbenchmarks and large-scale simulations.


https://www.usenix.org/conference/nsdi25/presentation/ji
Tuesday April 29, 2025 11:50am - 12:10pm EDT
Independence Ballroom

2:00pm EDT

Mitigating Scalability Walls of RDMA-based Container Networks
Tuesday April 29, 2025 2:00pm - 2:20pm EDT
Wei Liu, Tsinghua University and Alibaba Cloud; Kun Qian, Alibaba Cloud; Zhenhua Li, Tsinghua University; Feng Qian, University of Southern California; Tianyin Xu, UIUC; Yunhao Liu, Tsinghua University; Yu Guan, Shuhong Zhu, Hongfei Xu, Lanlan Xi, Chao Qin, and Ennan Zhai, Alibaba Cloud


As a state-of-the-art technique, RDMA-offloaded container networks (RCNs) can provide high-performance data communications among containers. Nevertheless, this seems to be subject to the RCN scale—when there are millions of containers simultaneously running in a data center, the performance decreases sharply and unexpectedly. In particular, we observe that most performance issues are related to RDMA NICs (RNICs), whose design and implementation defects might constitute the "scalability wall" of the RCN. To validate the conjecture, however, we are challenged by the limited visibility into the internals of today's RNICs. To address the dilemma, a more pragmatic approach is to infer the most likely causes of the performance issues according to the common abstractions of an RNIC's components and functionalities.

Specifically, we conduct combinatorial causal testing to efficiently reason about an RNIC's architecture model, effectively approximate its performance model, and thereby proactively optimize the NF (network function) offloading schedule. We embody the design into a practical system dubbed ScalaCN. Evaluation on production workloads shows that the end-to-end network bandwidth increases by 1.4× and the packet forwarding latency decreases by 31%, after resolving 82% of the causes inferred by ScalaCN. We report the performance issues of RNICs and the most likely causes to relevant vendors, all of which have been encouragingly confirmed; we are now closely working with the vendors to fix them.


https://www.usenix.org/conference/nsdi25/presentation/liu-wei
Tuesday April 29, 2025 2:00pm - 2:20pm EDT
Independence Ballroom

2:20pm EDT

Eden: Developer-Friendly Application-Integrated Far Memory
Tuesday April 29, 2025 2:20pm - 2:40pm EDT
Anil Yelam, Stewart Grant, and Saarth Deshpande, UC San Diego; Nadav Amit, Technion, Israel Institute of Technology; Radhika Niranjan Mysore, VMware Research Group; Amy Ousterhout, UC San Diego; Marcos K. Aguilera, VMware Research Group; Alex C. Snoeren, UC San Diego


Far memory systems are a promising way to address the resource stranding problem in datacenters. Far memory systems broadly fall into two categories. On one hand, paging-based systems use hardware guards at the granularity of pages to intercept remote accesses, which require no application changes but incur significant faulting overhead. On the other hand, app-integrated systems use software guards on data objects and apply application-specific optimizations to avoid faulting overheads, but these systems require significant application redesign and/or suffer from overhead on local accesses. We propose Eden, a new approach to far memory that combines hardware guards with a small number of software guards in the form of programmer annotations, to achieve performance similar to app-integrated systems with minimal developer effort. Eden is based on the insight that applications generate most of their page faults at a small number of code locations, and those locations are easy to find programmatically. By adding hints to such locations, Eden can notify the pager about upcoming memory accesses to customize read-ahead and memory reclamation. We show that Eden achieves 19.4–178% higher performance than Fastswap for memory-intensive applications including DataFrame and memcached. Eden achieves performance comparable to AIFM with almost 100× fewer code changes.


https://www.usenix.org/conference/nsdi25/presentation/yelam
Tuesday April 29, 2025 2:20pm - 2:40pm EDT
Independence Ballroom

2:40pm EDT

Achieving Wire-Latency Storage Systems by Exploiting Hardware ACKs
Tuesday April 29, 2025 2:40pm - 3:00pm EDT
Qing Wang, Jiwu Shu, Jing Wang, and Yuhao Zhang, Tsinghua University


We present Juneberry, a low-latency communication framework for storage systems. Different from existing RPC frameworks, Juneberry provides a fast path for storage requests: they can be committed with a single round trip and server CPU bypass, thus delivering extremely low latency; the execution of these requests is performed asynchronously on the server CPU. Juneberry achieves it by relying on our proposed Ordered Queue abstraction, which exploits NICs’ hardware ACKs as commit signals of requests while ensuring linearizability of the whole system. Juneberry also supports durability by placing requests in persistent memory (PM). We implement Juneberry using commodity RDMA NICs and integrate it into two storage systems: Memcached (a widely used in-memory caching system) and PMemKV (a PM-based persistent key-value store). Evaluation shows that compared with RPC, Juneberry can significantly lower their latency under write-intensive workloads.


https://www.usenix.org/conference/nsdi25/presentation/wang-qing
Tuesday April 29, 2025 2:40pm - 3:00pm EDT
Independence Ballroom

3:00pm EDT

ODRP: On-Demand Remote Paging with Programmable RDMA
Tuesday April 29, 2025 3:00pm - 3:20pm EDT
Zixuan Wang, Xingda Wei, Jinyu Gu, Hongrui Xie, Rong Chen, and Haibo Chen, Institute of Parallel and Distributed Systems, SEIEE, Shanghai Jiao Tong University


Memory disaggregation with OS swapping is becoming popular for next-generation datacenters. RDMA is a promising technique for achieving this. However, RDMA does not support dynamic memory management in the data path. Current systems rely on RDMA’s control path operations, which are designed for coarse-grained memory management. This results in a trade-off between performance and memory utilization and also requires significant CPU usage, which is a limited resource on memory nodes.

This paper introduces On-Demand Remote Paging, the first system that smartly chains native RDMA data path primitives to offload all memory access and management operations onto the RDMA-capable NIC (RNIC). However, efficiently implementing these operations is challenging due to the limited capability of RNIC. ODRP leverages the semantics of OS swapping and adopts a client-assisted principle to address the efficiency and functionality challenges. Compared to the state-of-the-art system, ODRP can achieve significantly better memory utilization, no CPU usage while introducing only a 0.8% to 14.6% performance overhead in real-world applications.


https://www.usenix.org/conference/nsdi25/presentation/wang-zixuan
Tuesday April 29, 2025 3:00pm - 3:20pm EDT
Independence Ballroom

3:50pm EDT

CellReplay: Towards accurate record-and-replay for cellular networks
Tuesday April 29, 2025 3:50pm - 4:10pm EDT
William Sentosa, University of Illinois Urbana-Champaign; Balakrishnan Chandrasekaran, VU Amsterdam; P. Brighten Godfrey, University of Illinois Urbana-Champaign and Broadcom; Haitham Hassanieh, EPFL


The inherent variability of real-world cellular networks makes it hard to evaluate, reproduce, and debug the performance of networked applications running on these networks. A common approach is to record and replay a trace of observed cellular network performance. However, we show that the state-of-the-art record-and-replay technique produces empirically inaccurate results that can cause evaluation bias. This paper presents the design and implementation of CellReplay, a tool that records the time-varying performance of a live cellular network into traces using preset workloads and faithfully replays the observed performance for other workloads through an emulated network interface. The key challenge in achieving high accuracy is to replay varying network behavior in a way that captures its sensitivity to the workload. CellReplay records network behavior under two predefined workloads simultaneously and interpolates upon replay for other workloads. Across various challenging network conditions, our evaluation shows that real-world networked applications (e.g., web browsing or video streaming) running on CellReplay achieve similar performance (e.g., page load time or bitrate selection) to their live network counterparts, with significantly reduced error compared to the prior method.


https://www.usenix.org/conference/nsdi25/presentation/sentosa
Tuesday April 29, 2025 3:50pm - 4:10pm EDT
Independence Ballroom

4:10pm EDT

Large Network UWB Localization: Algorithms and Implementation
Tuesday April 29, 2025 4:10pm - 4:30pm EDT
Nakul Garg and Irtaza Shahid, University of Maryland, College Park; Ramanujan K Sheshadri, Nokia Bell Labs; Karthikeyan Sundaresan, Georgia Institute of Technology; Nirupam Roy, University of Maryland, College Park


Localization of networked nodes is an essential problem in emerging applications, including first-responder navigation, automated manufacturing lines, vehicular and drone navigation, asset tracking, Internet of Things, and 5G communication networks. In this paper, we present Locate3D, a novel system for peer-to-peer node localization and orientation estimation in large networks. Unlike traditional range-only methods, Locate3D introduces angle-of-arrival (AoA) data as an added network topology constraint. The system solves three key challenges: it uses angles to reduce the number of measurements required by 4× and jointly uses range and angle data for location estimation. We develop a spanning-tree approach for fast location updates, and to ensure the output graphs are rigid and uniquely realizable, even in occluded or weakly connected areas. Locate3D cuts down latency by up to 75% without compromising accuracy, surpassing standard range-only solutions. It has a 0.86 meter median localization error for building-scale multi-floor networks (32 nodes, 0 anchors) and 12.09 meters for large-scale networks (100,000 nodes, 15 anchors).


https://www.usenix.org/conference/nsdi25/presentation/garg
Tuesday April 29, 2025 4:10pm - 4:30pm EDT
Independence Ballroom

4:30pm EDT

Towards Energy Efficient 5G vRAN Servers
Tuesday April 29, 2025 4:30pm - 4:50pm EDT
Anuj Kalia, Microsoft; Nikita Lazarev, MIT; Leyang Xue, The University of Edinburgh; Xenofon Foukas and Bozidar Radunovic, Microsoft; Francis Y. Yan, Microsoft Research and UIUC


We study the problem of improving energy efficiency in virtualized radio access network (vRAN) servers, focusing on CPUs. Two distinct characteristics of vRAN software—strict real-time sub-millisecond deadlines and its proprietary black-box nature—preclude the use of existing general-purpose CPU energy management techniques. This paper presents RENC, a system that saves energy by adjusting CPU frequency in response to sub-second variations in cellular workloads, using the following techniques. First, despite large fluctuations in vRAN CPU load at sub-ms timescales, RENC establishes safe low-load intervals, e.g., by coupling Media Access Control (MAC) layer rate limiting with CPU frequency changes. This prevents high traffic during low-power operation, which would otherwise cause deadline misses. Second, we design techniques to compute CPU frequencies that are safe for these low-load intervals, achieved by measuring the slack in vRAN threads' deadlines using Linux eBPF hooks, or minor binary rewriting of the vRAN software. Third, we demonstrate the need to handle CPU load spikes triggered by control operations, such as new users attaching to the network. Our evaluation in a state-of-the-art vRAN testbed shows that our techniques reduces a vRAN server's CPU power consumption by up to 45% (29% server-wide).


https://www.usenix.org/conference/nsdi25/presentation/kalia
Tuesday April 29, 2025 4:30pm - 4:50pm EDT
Independence Ballroom

4:50pm EDT

Building Massive MIMO Baseband Processing on a Single-Node Supercomputer
Tuesday April 29, 2025 4:50pm - 5:10pm EDT
Xincheng Xie, Wentao Hou, Zerui Guo, and Ming Liu, University of Wisconsin-Madison


The rising deployment of massive MIMO coupled with the wide adoption of virtualized radio access networks (vRAN) poses an unprecedented computational demand on the baseband processing, hardly met by existing vRAN hardware substrates. The single-node supercomputer, an emerging computing platform, offers scalable computation and communication capabilities, making it a promising target to hold and run the baseband pipeline. However, realizing this is non-trivial due to the mismatch between (a) the diverse execution granularities and incongruent parallel degrees of different stages along the software processing pipeline and (b) the underlying evolving irregular hardware parallelism at runtime.

This paper closes the gap by designing and implementing MegaStation–an application-platform co-designed system that effectively harnesses the computing power of a single-node supercomputer for processing massive MIMO baseband. Our key insight is that one can adjust the execution granularity and reconstruct the baseband processing pipeline on the fly based on the monitored hardware parallelism status. Inspired by dynamic instruction scheduling, MegaStation models the single-node supercomputer as a tightly coupled microprocessor and employs a scoreboarding-like algorithm to orchestrate "baseband processing" instructions over GPU-instantiated executors. Our evaluations using the GigaIO FabreX demonstrate that MegaStation achieves up to 66.2% lower tail frame processing latency and 4× higher throughput than state-of-the-art solutions. MegaStation is a scalable and adaptive solution that can meet today’s vRAN requirements.


https://www.usenix.org/conference/nsdi25/presentation/xie
Tuesday April 29, 2025 4:50pm - 5:10pm EDT
Independence Ballroom

5:10pm EDT

Efficient Multi-WAN Transport for 5G with OTTER
Tuesday April 29, 2025 5:10pm - 5:30pm EDT
Mary Hogan, Oberlin College; Gerry Wan, Google; Yiming Qiu, University of Michigan; Sharad Agarwal and Ryan Beckett, Microsoft; Rachee Singh, Cornell University; Paramvir Bahl, Microsoft


In the ongoing cloudification of 5G, software network functions (NFs) are replacing fixed-function network hardware, allowing 5G network operators to leverage the benefits of cloud computing. The migration of NFs and their management to the cloud causes 5G traffic to traverse an operator’s wide-area network (WAN) to the cloud WAN that hosts the datacenters (DCs) running 5G NFs and applications. However, achieving end-to-end (E2E) performance for 5G traffic across two WANs is hard. Placing 5G flows across two WANs with different performance and reliability characteristics, edge and DC resource constraints, and interference from other flows is different and more challenging than single-WAN traffic engineering. We address this challenge and show that orchestrating E2E paths across a multi-WAN overlay allows us to achieve average 13% more throughput, 15% less RTT, 45% less jitter, or reduce average loss from 0.06% to under 0.001%. We implement our multi-WAN 5G flow placement in a scalable optimization prototype that allocates 26%–45% more bytes on the network than greedy baselines while also satisfying the service demands of more flows.


https://www.usenix.org/conference/nsdi25/presentation/hogan
Tuesday April 29, 2025 5:10pm - 5:30pm EDT
Independence Ballroom
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.