SEUSS: Skip Redundant Paths to Make Serverless Fast



- Hide Paper Summary
Paper Title: SEUSS: Skip Redundant Paths to Make Serverless Fast
Link: https://dl.acm.org/doi/abs/10.1145/3342195.3392698
Year: EuroSys 2020
Keyword: Microservice; Serverless; OS; Process Template; SEUSS



Back

Highlight:

  1. Unikernels provide lightweight system management capabilities to applications, and can be conveniently linked into the application binary such that the application runs like a highly specialized OS with minimum modules for supporting the application. This enables an application to run like an OS on the bare metal machine, while also having millisecond level initialization time.

  2. Virtual machines (VMMs) with unikernels allows strong isolation between applications, since each of them execute within a separate virtualized environment, while preserving efficiency and minimizes VMM startup time, as the unikernel is super lightweight and is tightly coupled with the application.

  3. Snapshot of memory and execution states can be taken for unikernels running in VMMs and loaded back to the VMM to resume execution. This enables use to take snapshots of serverless processes at different stages of execution to provide different “short-cuts” to starting a new function instance. The paper proposes three levels of snapshots: After system and interpreter initialization; After library import and compilation have completed; During the common path of execution for function instances that perform similar tasks (and likely with similar arguments).

Comments:

  1. If you link the unikernel OS with only essential drivers, wouldn’t that partially invalidate one of the purposes of microservices, i.e., ease of deployment? Because now you need to tailor the set of device drivers that are necessary for running on different machines. This may not be a serious issue for two reasons. First, most serverless performs application-level tasks, and general purpose drivers are sufficient. Second, the unikernel is hosted by a QEMU VM, and the VM could abstract away part of the hardware differences on different machines.

  2. Starting a new process and setting up the interpreter for every new serverless request requires lots of initialization work such as OS-related process initialization and interpreter import and environment setup. This paper just proposes that we can start VMM with a pre-initialized image to reduce these overheads. While I agree that the interpreter part is definitely not needed anymore, shouldn’t the initialization and setup of VMMs, and probably loading these images from I/O devices (or you store them as memory objects?) also require some extra latency? What is the trade-off here? Do we simply assume that the interpreter part is the major source of overhead?

  3. According to the paper, there are three types of page faults. The first two types are demand-paging and CoW that is native to the unikernel OS, which are the conventional types of page faults. The third type is writing to a page that is perfectly writable in the unikernel OS, but must trigger CoW just because the page belongs to a snapshot image. I think the unikernel OS must maintain some extra metadata to distinguish the native CoW and the snapshot-related CoW. Maybe this can be tracked within the unused bits of PTEs, or using a shadow page table. The paper did not elaborate on this part, and just said a shallow copy of the page table is taken from the snapshot image to the VMM’s private space.

  4. Does new VMM instances need to mark all pages as non-readable, non-writable? This seems necessary, since the derived snapshots are stored as deltas.

This paper proposes SEUSS, a serverless framework that optimizes the long cold-start latency of processes using memory snapshots of initialized processes as templates. The paper is motivated by previous works that attempted to reduce cold-start latency of serverless processes by forking an existing process that has already been initialized. The paper argues that the applicability of these approaches are limited, as the support for forking existing processes together with the interpreter with all libraries imported is heavily implementation dependent, and only a few interpreters can achieve that. The paper addresses the issue by using an unikernel OS as the execution environment of serverless functions. Process templates are hence saved by taking snapshots of the full memory and execution states of the OS without involving the interpreter.

Unikernels are operating system kernels that minimize the number of modules that is included in the runtime image. Unikernels are typically tailored to the applications that will run on them, and it only contains the system services and stacks required by the application. In addition, unikernels only support a single address space, and therefore, it runs the best with only one process. From a different perspective other than the conventional OS-application division, unikernels are just applications linked with system software and device drivers that are able to execute both user-level and system-level tasks as a self-contained unit. Unikernels are perfect candidate for providing isolation to microservices and serverless, as the kernel image is usually small, and the time it takes for booting up a unikernel is usually within milliseconds due to its minimum number of services.

SEUSS uses EbbRT as the underlying unikernel OS. The EbbRT is hosted by a QEMU-based virtual machine, allowing the unikernel to run on the bare metal hardware with full system management capabilities (e.g., having its own page tables) while providing isolation at the virtual machine layer. The EbbRT is linked with common scripting environments such as Python and Node.js. Essential device drivers such as the NIC driver, and software stacks such as user-space TCP-IP are also integrated into the unikernel image. The paper notes that, although unikernels encourage the usage of application-specific environments, rather than the general purpose environment proposed in this paper, it helps to reduce the number of unikernel snapshot images and the storage requirement for hosting these images by providing a common base image.

One particularly good thing about combining unikernels and virtual machine is that the full-system image plus the contextual states (e.g., register files and other internal processor states) can be easily captured and dumped to persistent storage for later use. These snapshots can also be loaded into a fresh virtual machine together with the execution context to resume execution at the exact point where the snapshot was previously taken. SEUSS leverages this feature at three different levels. First, the base image snapshot is taken after the system completes initialization and the interpreter environment has been set up. This image is generally useful for avoiding overall initialization and startup overhead, and there is one such base image per interpreter environment. At the second level, per-function images are generated after importing the libraries and modules required by the function, and compiling the function implementation into byte codes, at which stage the function is already ready for execution. Function arguments are provides via a script running on the host machine that communicates with the gest VMM. At the third level, argument-specific images can also be generated by capturing the contextual states of a function half-way through the execution. The execution path that has been covered in the snapshot must be common for a set of argument values, such that computation need not be replicated for each instance of the function. In this case, only the arguments that will change the execution path following the snapshot need to be provided by the external script.

To avoid storing a full system image for each snapshot, SEUSS maintains a lineage of snapshot images like a tree structure, and only maintains the delta between the parent and the child snapshot. This is achieved with a special driver in the unikernel, which marks the pages after taking the parent snapshot as clean (page table dirty bits are largely useless in other scenarios because in serverless scenario the application never gets swapped out), and then let the MMU track pages that are modified since then using the per-page dirty bit. When the child snapshot is to be taken, the page table is walked by the driver program, and only dirty pages are saved. Saved snapshots are kept in the shared part of the main memory as binary blobs. New VMM instances can be started on these snapshots by directly using the image as the physical address space and loading the execution context. The page table of the guest OS running in the VMM, however, needs to be copied to the VMM instance’s private memory as the actual page table being used, and mark all pages as read-only (only a shallow copy is sufficient, as the fine-grained access control can be postponed to the point when a fault actually happens). The guest unikernel OS needs to handle three types of page faults: Demand paging, native CoW, and snapshot-related CoW. The first two types of page faults are conventional faults that can be simply handled just as in regular cases. The third type, however, indicates that a page that belongs to the read-only snapshot is to be modified by the VMM instance. In this case, the OS must react by allocating a new page from its process-private memory, and copying the snapshot from the corresponding image (recall that snapshot images form a lineage and derived images are stored only as deltas) to the newly allocated page.

When a request is received by the dispatching process running in the host OS, the dispatching process first determines, using the function name and arguments, which snapshot to use. In the most general case, the dispatching process will select the base snapshot image after system and interpreter initialization. If a function-specific snapshot, or even better, an argument-specific snapshot exists, the dispatching process will then start the VMM with one of the qualified snapshots as the main memory image. Inputs and outputs of functions are sent to and from the VMM with the help of a driver running within the VMM instance.