EXCITE-VM: Extending the Virtual Memory System to Support Snapshot Isolation Transactions



- Hide Paper Summary
Paper Title: EXCITE-VM: Extending the Virtual Memory System to Support Snapshot Isolation Transactions
Link: https://dl.acm.org/doi/10.1145/2967938.2967955
Year: PACT 2016
Keyword: Virtual Memory; STM; SI-TM; EXCITE-VM



Back

Highlight:

  1. Using per-thread page mapping (rather than the default shared page table paradigm of Linux threads) to allow each thread to access its own snapshot of the address space. In this case, the thread must use an empty page table at the start of a transaction, and demand-paging all data it accesses by invoking the page fault handler and letting the handler construct the proper memory image.

  2. A multiversioning address space can be built without special hardware by maintaining a global write log and replaying log entries on each demand-paging page faults.

  3. A page cache can be added to save pages that have been constructed and to avoid invalidating entries on transaction commit. Also, the page cache can help reducing log replay when the requested version is larger than the cached version by just replaying the log entries between the current and the request versions.

Questions

  1. Why do log entries store the numerical delta rather than binary after-value? The paper claims that it is good for both replay and roll back. But in reality, SI only requires replay but not roll back, since on transaction aborts, dirty pages can be directly discarded from the thread’s address space without affecting the global base image.

This paper presents EXCITE-VM, a software-based virtual memory extension that enables snapshot isolation semantics for individual threads. The paper is motivated by two observations. First, multicore programming for non-regular data structures, such as graphs, are non-trivial as frequent thread synchronization is required to avoid data corruption. Besides, reliability may also be a concern in a multi-threaded environment if threads may fail due to invalid operations or accesses. Second, although Software Transactional Memory (STM) is a suitable programming paradigm for multicore synchronization, they often require a software indirection layer for multiversioning semantics. The indirection layer needs to be invoked on every load and store instructions, which adds extra cycle and data overhead to the original application.

EXCITE-VM addresses the above two issues using Snapshot Isolation (SI) abstraction provided by virtual memory systems. To simplify thread synchronization, each thread in EXCITE-VM reads from a consistent snapshot that is only determined by the begin timestamp of the thread’s transaction, which remains unaffected by concurrent threads that commit onto the actual memory image. In addition, the virtual memory system, which is a built-in address translation mechanism, is taken advantage of to provide a per-thread execution context in which data pages would reflect the memory image of a certain time point, rather than the most up-to-date per-process image.

EXCITE-VM relies on the SI semantics between threads. SI semantics mandates that threads read a consistent image of memory which is fixed at some time during the execution. Here “consistency” means that the memory image should be the result of some legal execution order of committed transactions at the time of transaction begin. After transaction begin, the consistent image will not be affected by concurrent committing transactions writing into the global image. Besides, SI also requires that the committing transaction must not have write-write conflicts with an interleaving transaction, meaning that the write set should either be constantly monitored against concurrent commits, or the write set should be validated against the write set of committed transactions before the commit point. EXCITE-VM adopts the latter approach for less inter-thread communication.

In a conventional Linux process, threads are maintained just like regular processes, with each thread having a standalone control block. In a normal application, all threads share the same page table (the pointer of which is stored in CR3 register), enabling them to have the same view of memory. EXCITE-VM breaks this invariant by assigning each thread an individual page table, such that different threads may see different values at the same memory address, depending on the snapshot these threads are accessing. All threads in an application starts from a common base image serving as the starting point of execution. The image each thread accesses deviates from each other as transactions commit and new transactions begin. Before a transaction begins on a thread, the thread’s page table is re-initialized such that all entries are invalid. During thread execution, when a page is accessed for the first time, a page fault would occur as a result of accessing an address with invalid page table entry. The EXCITE-VM page fault handler then constructs the proper view of memory using the timestamps of the requesting transaction and the committed values by earlier transactions that the current transaction is supposed to see. At commit time, the write set of the current transaction is appended to a global write log, and all pages in the transaction’s view of memory will be discarded (certain optimizations exist to cache these pages across transaction boundaries, as we will see later).

The operation details are presented as follows. At transaction begin, EXCITE-VM assigns the transaction a unique begin timestamp using a global counter. The begin timestamp indicates the version of memory image the transaction will access, and EXCITE-VM satisfies the access requests by “synthesizing” a data page from a base image and the modification logs of transactions that have committed the begin timestamp. Each committed transaction is also assigned a commit timestamp from the same global counter as begin timestamps. The commit timestamp is used to tag all write log entries of the committed transaction, such that they will appear in the memory image of all transactions whose begin timestamp is larger than the commit timestamp. EXCITE-VM runtime tracks the write set of the transaction, and appends them to the end of the global write log if commit succeeds. Validation is also performed by comparing the current write set with write sets of committed transactions whose commit timestamp is between the begin and commit timestamp of the current transaction. These committed write sets can be obtained by traversing the commit log for entries between the two mentioned time stamps. The paper also suggests that the set intersection test can be optimized using two bloom filters, with each containing hashed addresses from one write set. If the bloom filter tests negative by AND’ing all addresses together, which should be the majority of cases, conflict detection succeeds and the transaction commits. The commit process fails otherwise. In either case, the transaction discards all data pages in its local page table. The invalidation of page table entries can be further delayed to the next transaction begin. Note that when the page table entries are invalidiated, TLB should also be flushed for the sake of coherence. If the transaction has been scheduled on another core, a TLB shootdown should also be invoked to globally clear the translation entries.

To prevent the global write log from growing indefinitely, and also to upper bound the number of log entries to be replayed for every access, EXCITE-VM also has background “checkpointing” threads that keep replaying log entries before truncating the log. The checkpointing thread reads the global write log in old-to-new order, and apply the delta to the shared base image which is mapped by the process’s page table. Replay should stop at the earliest visible commit timestamp (i.e., there exists no log entry with a larger timestamp between the log entry’s commit timestamp and some uncommitted transaction’s begin timestamp).

The paper also proposes maintaining a page cache that stores the most recently accessed pages and their versions. The page cache can be considered as a cache of partially materialized pages, which can serve as the starting point of log replay instead of the process’s current base image. When a transaction successfully commits, instead of discarding all its accessed pages, it stores them in the page cache with the version of these pages (if the page is dirty, then the version of the page is the commit timestamp of the transaction). The page table may also stay valid, optimistically assuming that no transaction will ever commit on the page before the next time the same page is requested. The page fault handler is then modified as follows. First, when a page fault occurs, instead of constructing the page from the process’s base image, the handler first checks the page cache. If the address hits the cache, then the handler compares the page version against the requested version, and constructs the requested page by replaying log entries that are needed to bring the cached version to the requested version. Additionally, if the TLB optimization is also adopted, when a transaction commits, it should also perform page table invalidation for threads whose page table entries are still valid for the page, and issue TLB shootdown for these pages. The paper claims that such lazy invalidation of TLB entries can save some TLB shootdown overhead if overwrites are relatively rare. The page cache is maintained by EXCITE-VM software runtime as a software cache, which follows a simple rule for eviction. On eviction, the page table and TLB entries should also be invalidated to avoid accessing invalid pages.