FaaSnap: FaaS made fast using snapshot-based VMs

- Hide Paper Summary

Paper Title: FaaSnap: FaaS made fast using snapshot-based VMs
Link: https://dl.acm.org/doi/10.1145/3492321.3524270
Year: EuroSys 2022
Keyword: Serverless; Cold-Start Latency; Snapshotting; Virtual Machine; Firecracker

Back

Highlight:

The existing Firecracker VMM can capture the snapshot of a VM by mapping the physical address space of the VM to a disk file on the host system. Memory modifications made by the guest VM instance can then be committed back to the file which is then loaded back later on future invocations of the same function.
The problem with the above approach is that the memory contents of the guest VM is served by expensive major page faults on-demand, especially for anonymous pages whose contents are zero.
To address the problem, the paper proposes to record the working set and save them separately into a snapshot file. Pages are grouped into consecutive address regions and written into the file following the chronological order. Zero anonymous pages do not need to be saved, and instead, we only log their address range.
When the snapshot is loaded back to the main memory, FaaSnap performs a three-level memory mapping. The first level is to map the regular memory snapshot file as a whole. The second level is to map zero anonymous pages as anonymous pages on the host. The last level, which runs concurrently with VM execution, maps each region in the FaaSnap’s snapshot file with parameters indicating that the pages should be eagerly fetched into the page cache. Neither zero anonymous pages nor data pages recorded in FaaSnap would need major page faults.

This paper presents FaaSnap, a serverless (Function-as-a-Service, FaaS) framework that enables fast function invocation by using efficient snapshotting. The paper is motivated by the high cold-start latency of function invocation in production environments. While prior works have already explored the idea of leveraging memory snapshots for fast startups, FaaSnap further optimizes the snapshotting algorithm such that both the I/O cost and the latency penalty for loading the snapshot is minimized. When evaluated on a variety of serverless workloads, FaaSnap achieves nearly ideal performance compared with warm startups which also outperforms prior works in general.

The paper begins with a review of prior techniques for reducing the cold-start latency, which is a phenomenon observed in serverless functions that inhibits efficient function invocation if the function is “cold”, i.e., its working set has not been loaded into the main memory yet. These techniques can be classified into three different types. The first is called Keep-Alive policy, which caches an existing function instance in the main memory for a while after it has completed execution. Both the process image and the accessed files are hence “warm” in the main memory or the page cache, enabling future execution to be started with low latency. However, the paper noted that function Keep-Alive has limited benefits because it consumes main memory resources for an extended period of time which may prevent useful work from being done on the host. The situation is further aggravated by the fact that most functions are not repeatedly called, meaning that future invocations will likely still be cold starts.

The second technique is to leverage unikernels and lightweight virtual machines to reduce the amount of work during function startup, hence enabling millisecond-level function invocation latency, as exemplified by Firecracker. Furthermore, Firecracker also supports taking the memory snapshot of a virtual machine by mapping the guest memory into a disk file on the host system. In this scenario, when a page fault is raised to the host kernel, if the page belongs to a VM instance whose memory state is mapped to a file, the page fault will be handled as a major page fault, and the corresponding disk file is mapped to the guest physical memory by the kernel. Changes made to the guest physical address space can then be committed back to the file. While this approach lowers startup latency compared with the regular VMs, the paper observes that Firecracker still suffers unnecessarily higher latency due to small and random DISK I/Os on the snapshot file due to demand paging. In addition, anonymous memory in the guest system is mapped to the file as part of the memory snapshot and must be loaded back on snapshot recovery. However, unused anonymous memory, in practice, can be directly initialized on the host as anonymous memory rather than being read from the snapshot file, which wastes processor cycles and I/O bandwidth to fetch it from the snapshot file.

The last technique is REAP, which adopts prefetching to accelerate the startup process. REAP records the working set of functions by hooking on the page fault handler function in the user space (or by checking the “Access” bit in the guest page table entries). Pages that are accessed during function execution will then be saved to a disk file as a prefetch set and reloaded back to the main memory the next time the function is invoked. While this approach can effectively turn major page faults incurred by cold starts into minor page faults that only require searching the page cache and establishing address mappings, the paper identifies two problems. First, REAP only records the first run of a function and uses the working set to generate the prefetch set. Consequently, the performance benefit of REAP is sensitive to the stability of the working set. If the working set changes from run to run, REAP still needs to read from the original memory file as in a cold start. Second, REAP loads the prefetch set into the main memory synchronously at the beginning of execution, which constitutes part of the startup latency.

To address the issues with prior works, FaaSnap adopts a snapshot approach that captures a more general and more precise working set of function execution and stores it as a disk file. The snapshot file can then be loaded back to the main memory in the background without blocking normal execution. FaaSnap is built on Firecracker and extends its snapshotting model. From a high level, the snapshot process in FaaSnap is controlled by the system daemon running on the host system. The daemon is very much similar to the existing daemon in Firecracker in that it serves as a gateway to the outsider world and coordinates VM’s execution and resource. When a function is first time invoked, the daemon tracks the working set of the function and saves it to a snapshot file. On later invocations of the same function, the daemon will load the snapshot file and start from the snapshot, hence reducing the startup latency.

We describe the operational details as follows. During function execution, the FaaSnap daemon monitors the dynamic memory consumption of the function instance. If the memory consumption increases by a certain threshold (1024 pages, i.e., 4MB), it then uses a system call “mincore” to obtain a list of virtual page addresses that are currently being mapped into the address space. Note that “mincore” is executed by the daemon and hence reflects the host-side view of the function’s working set. The paper also emphasizes that, compared with REAP’s approach for obtaining the working set via guest-side page fault hooks, the host generally has a more comprehensive view of the working set (e.g., via the kernel read-ahead mechanism that prefetches disk files into the page cache). As a result, the snapshot captured by FaaSnap suffers fewer major page faults when loaded back later because it is more likely to cover more use cases or parameter combinations than the one generated by REAP.

After the snapshot is captured, FaaSnap saves the snapshot data to a disk file. The paper also proposes a few optimizations in order to minimize the amount of data to be saved and maximize sequential reads on future invocations. First, anonymous pages whose contents are zero are not saved. Instead, the FaaSnap daemon merges adjacent zero pages into zero regions, and simply stores the address range of the region. These zero regions can be conveniently mapped back to the virtual address space as anonymous memory on future invocations, which eliminates the major page faults that are needed in the current Firecracker implementation to fetch these zero pages from the memory file. Second, data pages are also merged into regions and saved to the snapshot file as a “loading set”. The paper suggests that if the pages are scattered across the address space, then FaaSnap will attempt to merge as many as possible (e.g., by allowing a few zero pages in between) to minimize the number of regions. Lastly, regions are written into the snapshot file following roughly the chronological order that pages in the region are accessed during the recording phase, i.e., if some pages in a region are accessed early during the execution, then the entire region should also be written to a smaller offset of the file, such that the region under discussion will be loaded before other regions when the snapshot is restored.

On future invocations of the function, FaaSnap will load data contained in the snapshot file into the page cache instead of serving them from the existing Firecracker’s memory snapshot file upon major page faults. In order to restore the snapshot image, which now consists of three layers, i.e., pages not recorded and reside in the regular memory snapshot file, zero regions whose ranges are saved to FaaSnap’s snapshot file, and the regions that contain data being recorded, FaaSnap performs a hierarchy of memory maps which we describe as follows. First, it maps Firecracker’s snapshot file into the main memory using a single mmap just as in the current implementation. Then it also maps zero regions into the VM’s address space, but the parameters to mmap indicate that the mapping is fulfilled by anonymous memory. As a result, memory accesses to the zero regions will be handled with minor page faults rather than major ones. Lastly, FaaSnap kick-starts the VM instance and, in the meantime, maps data regions in the snapshot file using mmap. Since regions are stored in the file successively, it is sufficient to call mmap once per region. The parameters to mmap will also indicate that the contents of the map should be eagerly fetched to the page cache such that they can be served with minor page faults. The paper also emphasizes that the last step, called “concurrent paging”, does not block the VM from execution. If the VM accesses a page before the daemon can map it from the snapshot file, the page fault will then be captured by the VMM, which is handled by the daemon as a regular major fault that reads the corresponding region from the snapshot file.