On-Line Memory Compression for Embedded Systems



- Hide Paper Summary
Paper Title: On-Line Memory Compression for Embedded Systems
Link: https://dl.acm.org/doi/10.1145/1698772.1698785
Year: ACM Transactions 2010
Keyword: Compression; DRAM Compression; CREAMES; Embedded compression



Back

Highlight:

  1. Treats part of the main memory as a disk for storing compressed pages. This does not change virtual memory system framework, but increases the effective size of memory.

  2. Using the same design for both virtual memory compression and file system compression. Reads and writes are intercepted by the OS module to transparently compress and decompress file data.

Questions

  1. I do not quite get how the direct mapped table works. Usually, the MMU allows the OS to store the disk address (or compressed partition address, in our case) in the PTE, which does not require a separate mapping, since the physical address can be directly retrieved. The paper suggests that a direct-mapped, small table be used, so I guess it is addressed using the physical frame number of the page where it was originally stored? This is incorrect, since multiple page instances can be mapped to the same physical page frame. On the other hand, if the mapping table is indexed with virtual addresses, the table would not be small, since virtual address space (plus process ID) is typically much larger than the physical address space, unless it is not the case for XScale (Intel’s ARM processor)?

  2. I also do not quite get the case where a physical page shared by two or more processes gets swapped out to the compressed region, and then swapped in by one of them. In that case, shouldn’t all references to that physical frames (i.e., PTEs) be reinstated? Why keeping track of the reference count? One possibility is that this can be done lazily, in which case, it is true that the compressed page must reside in the compressed region until all PTEs are reinstated. This is a huge waste of resource, though.

This paper proposes CREAMES, a main memory compression solution for embedded systems where memory is a scarce resource, while no disk is present as swap space. This paper points out that conventional solutions for main memory compression will not work on embedded platform, since these prior solutions usually assume disks as the backing store of the virtual memory system. The paper also lists five challenges for a main memory compression solution on embedded platforms. First, compressed and uncompressed memory region must be carefully maintained, such that the application will not be terminted due to lack of usable memory. There is a trade-off: On one hand, if too much storage is dedicated to storing compressed pages, application’s memory allocation requests might fail, while the compression region still has free space. On the other hand, if the compression region is too small, the overall compression ratio would be insignificant, limiting the effectiveness of compression. Second, the compression algorithm must achieve a balance between compression ratio, compression latency, and energy compsution. Although this is the general requirement for nearly all compression solutions, it is particularly important on embedded platforms, due to limited processing power and battery size. Third, the memory management module for compressed pages should utilize compressed memory region efficiently to reduce fragmentation, since compressed pages are variable sized. These pages should be stored as compactly as possible, while minimizing relocations when the compressed size changes. Fourth, the compressed memory region should be dynamically adjusted to fit dynamic program behavior, as different applications demonstrate different memory footprints. Lastly, the memory overhead for supporting compression should be minimized, since memory is a scarce resource on embedded platform.

The paper assumes an embedded platform with virtual memory, but no disk for swapping out pages when physical memory is over-committed. The main memory size is tens of MBs, in which some of them is dedicated to storing program code and the firmware image, which are copied from the embedded to embedded flash chips to the main memory at startup time. Although this paper only discusses data compression, meaning that only working data in the main memory is considered for compression, code compression can also be applied to the flash image, which is orthogonal to CREAMES, and can be applied individually. The Operating System of the embedded platform handles page faults when a non-existing page is accessed by applications. In normal use cases, this will raises an insufficient memory error, resulting in the termination of the application. In a compressed platform, the page fault will be handled by the memory compression routine, as we discuss later.

CREAMES works as follows. The main memory is divided into two partitions, one for storing working data, which are mapped into application’s address space, and can be accessed directly by program code. The other partition is dedicated to storing compressed pages, which are not accessed directly, and must be retrieved by the decompression routine on a page fault. Compression and decompression are driven by page faults. The OS will over-commit the main memory by allowing more physical pages than the size of the first partition to be allocated to applications, and mark some of them as “not present”, which will trigger page faults when they are accessed. For example, the OS can choose to maintain all available physical pages in the first partition as a linked list. When the application requests a new page, the OS will select one from the list, and populate the page table using the physical address of the selected page. If, however, the linked list is currently empty, the OS must swap out an existing physical page (which is maintained by another list structure, possibly with LRU or other access information maintained) to the compressed region, and then populate the page table entry for the requested address using the physical address of the swapped out page frame. The swapped out page is then compressed, in the background, by an OS thread, and then stored in the compressed page partition. The page table entry of the virtual address whose physical page was just swapped out is also marked as “not present”, such that when the page is accessed later, the page fault handler will search the compressed area, decompress the compressed page, and load it back to the main memory. During this process, another page might be selected by the OS for swapping, and compressed before stored into the compressed page partition.

The compressed page partition is maintained differently than working data, since compressed pages are of different sizes, and cannot be mapped to physical frames using one-to-one correspondence. To maximize the efficiency of the compressed page partition, the paper suggests that a dedicated allocator be used to manage storage for the variably sized compressed pages. Compressed pages are searched using a 256KB, direct-mapped partition header. The header maps the identifier of the swapped out page (virtual address + process ID) to the physical address of the compressed page, which allows the page to be retrieved when a page fault is raised on the virtual address where the page used to be mapped.

The size of the compressed area is also maintained dynamically. Although the paper does not give a full description of the storage management policy, it is suggested that the compression engine will request for more memory when the compressed partition has exhausted all pages, and combines and frees pages when compressed pages are freed. A minimum size greater than zero must always be reserved, however, to avoid applications depleting all available pages and causing compression to fail without even using it.

File systems can also be compressed similarly. On embedded platforms, the file system storage is usually emulated using main memory, with part of the main memory allocated as a ram disk. In this configuration, CREAMES can intercept read and write system calls to the file system, and serve the request from a third partition dedicated to the ram disk. The internal mapping and request handling is similar to compressed pages.