Quanxi Li, Hong Huang, Ying Liu, and Yanwen Xia, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Jie Zhang, Peking University; Mosong Zhou, Huawei Cloud; Xiaobing Feng and Huimin Cui, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Quan Chen, Shanghai Jiao Tong University; Yizhou Shan, Huawei Cloud; Chenxi Wang, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
The Microsecond (µs)-scale I/O fabrics raise a tension between the programming productivity and performance, especially in disaggregated memory systems. The multithreaded synchronous programming model is popular in developing memory-disaggregated applications due to its intuitive program logic. However, our key insight is that although thread switching can effectively mitigate µs-scale latency, it leads to poor data locality and non-trivial scheduling overhead, leaving significant opportunities to improve the performance further. This paper proposes a memory-disaggregated framework, Beehive, which improves the remote access throughput by exploiting the asynchrony within each thread. To improve the programming usability, Beehive allows the programmers to develop applications in the conventional multithreaded synchronous model and automatically transforms the code into pararoutine (a newly proposed computation and scheduling unit) based asynchronous code via the Rust compiler. Beehive outperforms the state-of-the-art memory-disaggregated frameworks, i.e., Fastswap, Hermit, and AIFM, by 4.26×, 3.05×, and 1.58× on average.