Kiln: Closing the Performance Gap Between Systems With and Without Persistence Support
- Hide Paper Summary
Link: https://dl.acm.org/citation.cfm?doid=2540708.2540744
Year: MICRO 2013
Keyword: Kiln; NVM; Cache; LLC
Comments:
-
Kiln uses redo-logging (actually, shadow-mapping because there is no data migration back to main copy) based reasoning method for ensuring persistency
-
With a NV-cache, the reasoning with redo-logging does not change; We can just treat the non-volatile LLC as NVM, and L1, L2 as regular cache. Data must not be released to the LLC before transaction commit (and this paper did not even point out how transactions commit atomically, i.e. half-committed transaction may be unrecoverable if power failure happens in the middle of the commit flush).
-
The txn ID field is largely unused (only used to identify the txn that committed) but takes L1, L2, LLC tag storage.
-
What if a txn migrates from one cache to another? How do you flush a remote cache (e.g. via TLB shootdown-style interrupt which is expensive and cannot happen very frequently)?
This paper proposes Kiln, a novel design that achieves atomic persistence using persistent LLC. In classical designs there are two ways of achieving the same effect, using either Write-Ahead Logging (WAL) or Copy-On-Write (COW). WAL requires the programmer or hardware to generate log entries on every data modification. In addition, persistent barriers are also inserted on certain locations of the program to enforce the correct write ordering dependency between log entries and dirty cache lines. COW requires an extra level of indirection. On modification, data items are copied to a new location on the NVM, and then the modification is performed on the new copy. The mapping is changed to point to the new copy after the modification. New copies of data items must be persisted to the NVM before the transaction commits.
Neither WAL nor COW is perfect for efficient persistent atomic regions. Both schemes induce extra memory traffic. In WAL, data items must be copied twice (one to generate the log entry, using either old or new value based on logging type, another to update the data item), resulting in three copies. In COW, extra NVM storage is used by multiple versions of data items as well as the mapping table. In addition, if WAL is to be used, the long latency of persist barriers cannot easily be removed from the critical path, which is another source of slowdown.
This paper adopts a third approach, which extends the persistence domain to the last level cache (LLC) using techniques such as battery powered SRAM. No expensive data movement or persistence berriers are needed, because in order to achieve persistence, it is sufficient to evict data to the LLC, which has lower latency and higher bandwidth.
Kiln is motivated by redo logging. Generally speaking, the goal of achieving persistence can be further divided into two slightly different smaller goals: consistency and progress. The consistency requirement states that uncommitted changes are not supposed to be observed after the crash even if some uncommitted modifications may have already been conducted. Since the system can crash any moment during the execution, the exact state of memory is hard to know at the time of the crash. To solve this problem, in redo logging, no modification made by the operation is allowed to be persisted before the transaction commits. The data structure hence can stay consistent as if it were never modified by the application. If the system crashes before commit, all volatile states will be lost, including dirty data generated by the operation, which naturally rolls back all its changes. The progress requirement states that committed operations must not be rolled back in case of a crash. This is achieved by flushing all redo log entries into the NVM before the transaction commits. If the system crashes after this point, all valid redo log entries will be replayed as if the committed operation were executed in exact the same manner as it was during normal operation. Kiln follows a similar approach to guarantee consistency by not allowing dirty cache lines to be written back before transaction commits. In terms of progress, however, instead of relying on a sequential redo log which is flushed back on each transaction commit, Kiln leverages a persistent LLC, and uses metadata in cache tags to track log entries which are stored as LLC data. There are two obvious advantages of merging redo logs with LLC data entries. First, no redundant data is generated, since the LLC entry stores both redo log entry and data (this has limitations, as we will see below). Second, on recovery, data is instantly available without any form of log replay. The revocery routine only walks cache tags and makes sure that committed states are preserved while invalidating uncommitted data.
The operation of Kiln is described as follows. During normal operation, dirty cache lines written by different transactions are recorded by caches on all levels. On transaction begin, a globally unique transaction ID is distributed such that every transaction can be uniquelly identified using the ID. Dirty cache lines generated during the transaction are tagged with the transaction ID. Kiln extends tag arrays with an array of transaction IDs, which are set during a transaction when the cache line is written for the first time. Each level of the cache is also extended with a FIFO queue, which stores the metadata (transaction ID, line location, etc.) of dirty cache lines. When a cache line is first written, the metadata of the store operation is pushed into the queue. To help identify whether a store operation is the first store during the transaction, three more states are added to a cache line which co-exist with ordinary coherence states: Clean, which means that the cache line is not written by another transaction, committed or uncommitted; Pending, which means the cache is written by an uncommitted transaction; Persistent, which means that the cache line has been committed. Note that on all levels of caches, the FIFO queues are volatile, and will be reset on a power failure. The design of Kiln guarantees that the recovery routine can identify cache lines in different states using the metadata and state bits after a crash.
During normal operation, the LLC eviction algorithm must not evict an uncommitted cache line as in redo logging schemes. This is not always possible if all lines in a cache set are uncommitted, and a miss needs to load a new line into the set. There are two possible solutions. In the first solution, transactions causing the miss will just wait for existing transactions to commit, releasing the LLC slots. This, however, is not always possible, and can incur deadlock if a series of transactions form a dependency cycle. In the second solution, the cache controller raises an interrupt to request a chunk of memory from the OS as the overflow buffer (the paper author should think more carefully here, since the interrupt routine itself may cause another line to be loaded into the same set, which incurs infinitely many interrupts). The LLC cache controller then performs normal redo logging by evicting one of the uncommitted line into the buffer. Althoug the paper did not mention how the line is accessed if it is requested by upper level caches before the transaction commits, it can be done by adding the information into the FIFO queue, and indicating that the line has overflowed into the buffer. On commit, the line must be written back to the NVM as an in-place update.
On transaction commit, as described above, the cache controller evicts dirty lines in higher level caches to the LLC, after which the transaction becomes durable. The FIFO queue at each level is consulted on commit using the committing transaction ID. Entries will be removed from the queue after the dirty lines are evicted. Note that a line does not have to be totally removed from higher level caches. The line can remain in higher level caches in a clean state, which reduces future cache misses. After transaction commit, no future modification is allowed on persistent cache lines, since they also act as redo log entries. To see the reason, assume that later on, another transaction overwrites the persistent line, and then the system crashes. During recovery, the original content of the persietent line has been lost, which is now the data generated by an uncommitted transaction. Based on the above reason, if a committed line in the LLC is to be overwritten by an evicted line from higher levels, the committed line must be evicted back to NVM first to retain durability of committed transactions.
On recovery, no time consuming log replay or analysis is performed. The LLC cache controller walks the tag array, and invalidates cache lines whose state bits indicate that the line is uncommitted. Committed lines are not affected. In addition, if the overflow region is non-empty, the cache controller should also walk the overflow region and identify uncommitted lines that were overflowed to this region. These lines will also be invalidated because they belong to uncommitted transactions.