NvMR: Non-Volatile Memory Renaming for Intermittent Computing



- Hide Paper Summary
Paper Title: NvMR: Non-Volatile Memory Renaming for Intermittent Computing
Link: https://dl.acm.org/doi/10.1145/3470496.3527413
Year: ISCA 2022
Keyword: NVM; Intermittent Computing; NVMR; Idempotent Execution



Back

Highlights:

  1. Atomic snapshots can be achieved by redirecting all writes from read-write sequences to a new location, and only updating the mapping table atomically during the snapshot.

  2. Writes in write-read sequences do not need to be redirected, because these writes are blind writes, and are guaranteed to generate the same value on every re-execution.

  3. Idempotent regions can be delimited as a region that is free of read-write and write-read sequences on the same address. Such a region can be re-executed and interrupted for arbitrary number of times, given the same initial states, and always reach the same final state.

  4. The algorithm in this paper is similar to the one in HOOP, except that the mapping table update and the register dump do not need to be performed atomically. This is due to the battery-backed nature of intermittent computing devices – you can get a notification before battery depletion. It is therefore guaranteed that the battery can provide energy for the snapshotting operation before it completely drains.

Comments:

  1. The paper applies a simple concept to intermittent computing: On redo-logging, all dirty blocks must be held from being written back from the cache, before the transaction commits (or before the snapshot is taken), because otherwise the NVM image will be corrupted. This paper seeks to loosen this requirement a little, with the observation that if a word is blindly written before being read since the last snapshot, then the dirty word does not need to be held back in the cache. This is because on an re-execution, the word will still be blindly written anyway, so its contents do not matter, indicating that the word can be polluted on NVM without corrupting program state. This technique works with all redo-logging based design, and is not particularly related to intermittent computing. Of course, the algorithm proposed by the paper is not pure redo-logging. In fact, it is shadow paging + redo-logging, which has lower write amplification (similar to HOOP).

  2. The mapping table update and register dumping should be atomic, i.e., there should be no power interrupt between these two processes, because otherwise, the mapping table will be in an inconsistent state, which makes the system unrecoverable. The paper does not explicitly state this requirement, and also it is non-trivial to make mapping table update atomic (one way is to have two mapping tables. Updating the table is merely a single pointer swing). – On a second thought, NVMR is applied to intermittent computing devices, in which case you have a battery to ensure that the snapshotting can always be successfully taken without power interrupt. This is fundamentally different from conventional NVM-based snapshotting where interrupts can occur anytime. In other words, NVMR is more of a planned shutdown, rather than failure recovery.

This paper proposes NVMR, a NVM-based intermittent computing framework that supports snapshotting (called “backups” in the paper). Snapshotting is an indispensable feature for intermittent devices, due to the lack of reliable power source and the resulting intermittent nature of computing. With snapshotting, the device periodically checkpoints its execution state and memory state to the NVM. When the power source completely drains, or when an unexpected power interrupt occurs, the device loses all volatile states and the progress since the last snapshot. When power supply resumes, the device will restart from the last snapshot by loading the execution state and memory state and resumes execution from the snapshot. This way, some computing progress can always be made as long as the device makes at least one snapshot between power cycles.

The snapshotting model the paper assumes is as follows. The system is equipped with a write-back cache hierarchy, which can evict dirty cache blocks back to the NVM at any moment during execution. Snapshots are taken periodically at some points during execution as a measure of preserving progress. The snapshot is logically atomic with regard to execution, and it reflects the system state at the logical time point the snapshot is taken. A snapshot consists of a register dump of the processor, and the memory image that is consistent with the register dump. When execution is interrupted due to power interrupts or system failures, the most recent snapshot is restored by loading the register dump back to the processor, and reverting the memory state to the one recorded in the snapshot. Execution can then resume from the last snapshot as if the interrupt had never occurred.

There are two challenges with the snapshotting model described above. First, during a snapshot operation, dirty data should be written back from the cache, together with the register dump. These writes must be conducted as an atomic unit, because otherwise, the system will be in an inconsistent state during the operation. If power interrupts occur on this window, the snapshot would be unfinished, while the system is unrecoverable. Second, due to write back caching, dirty data may might be evicted from the cache before the snapshot operation takes place, which pollutes the memory image on the NVM device. If power interrupt happens after the eviction and before the snapshot, the system might be in an inconsistent state, because the memory image is no longer the one from the previous snapshot, making the system unrecoverable.

One way of dealing with the challenges is to divide the execution into idempotent regions. Each idempotent region represent a part of the execution that, given the same initial register dump and memory state, can always result in the same final system state, regardless of the location and the number of times the execution has been interrupted and restarted. This way, the snapshot can be taken as two separate, non-atomic parts, by writing dirty data from the cache, and then taking the register dump. If a power interrupt happens between these two, or during either of the two steps, execution can always resume from the previous snapshot, and reach the same system state, according to the definition of idempotent regions. In practice, an idempotent region can be delimited as a part of the execution where each memory location is either read-only or write-only, but not read-write and write-read. A snapshot is taken when a read-write on the same memory location is detected to avoid forming a non-idempotent region.

The paper observes that, however, the above idempotent-region will generate lots of unnecessary traffic due to frequent read-write/write-read on the same address. The paper argues that idempotent regions can still be preserved in the presence of read-write and write-read sequences, if read-writes are handled by redirecting the write to another location when the write is to be evicted from the cache, with some extra mapping information. This way, the system has two copies of the data on the address to be written, an old copy, which belongs to the previous snapshot, and a new copy, which belongs to the current execution and not yet committed to the snapshot. The new copy is committed to the next snapshot by atomically updating the mapping information to point to the new copy for all cache blocks that have seen the read-write sequence. Write-read, on the other hand, does not need any special treatment. The writes can be performed and evicted freely without any remapping. This is because even in the presence of an re-execution, the write will still be performed correctly with the same value, and the following read will always return the same value.

NVMR implements this idea by tracking both the read and the write set since the last snapshot using bloom filters. If a write is performed after a read, and the dirty block containing the written value is to be evicted, then the dirty block is redirected to a new location. To support this, NVMR adds a mapping table and a free list which are both allocated on the NVM. The mapping table remembers the location of the old version only, and the free list is simply an allocator that maintains cache block sized slots that can be allocated to evicted dirty data. To avoid updating the mapping table every time an eviction happens, NVMR also adds a mapping cache to the cache hierarchy. The mapping cache tracks both the new and the old copies of evicted dirty blocks since the last snapshot. The new location field is updated with the allocated slot address from the free list when an eviction occurs. The old location field is updated with the mapping table entry fetched from the NVM when a read access needs to access the NVM and misses the mapping cache. The read access itself is also redirected to the old location of data.

Cache blocks that only see write-read sequences are not remapped on eviction, and the dirty block is always written back to the old location stored in the mapping table (and cached by the mapping cache).

The snapshot is taken when the mapping cache is about to evict an entry that tracks both the new and the old copy. The snapshot operation consists of three steps. In the first step, all dirty data in the cache hierarchy is written back to the NVM, using either the new location stored in the mapping cache (if the new location exists), or using the old location stored in the mapping table and cached by the mapping cache (if the new location does not exist). Then the mapping table is updated atomically by writing mapping cache entries that have a new location back to the NVM. The old location field of the mapping table is updated with the new location stored in the mapping cache entries. This process should be performed atomically, but the paper does not elaborate how atomicity is enforced. Then in the last step, the register dump is taken, and the snapshot completes. This step should also be atomic with the mapping table update, and again the paper does not mention this requirement. – Correction: It does not need to be atomic, because the paper assumes that the device is battery-backed.

To recover from a snapshot, the system simply loads the register dump from the NVM, and resumes execution. The mapping table will be consistent with the register dump, since the two are persisted atomically. The mapping table entries will also point to data that is consistent with the snapshot, since if the power interrupt happens before the second step of snapshotting, then all entries of the table point to the old location, and the register dump is also from the last snapshot. Otherwise, if the interrupt happens after the second and the third step, then the mapping table entries would point to the new locations, and the register dump is from the current snapshot, which is also consistent. Power interrupts will not happen between the second and the third steps, since we assume that they are conducted atomically.

The mapping table stored on the NVM should also be constantly garbage collected (GC’ed) to avoid having too many entries. In fact, without GC, the mapping table will keep expanding to cover the full write working set. GC on the mapping table works by copying data, in the background, from its remapped location back to the home location, and removing the mapping table entry after the copy completes. The mapping table cache should also be invalidated during the process (although the paper did not cover what if the entry to be invalidated is currently being redirected to a new location). The location of data stored in the mapping table entry is returned to the free list as well. This process is similar to the log replay in redo logging, and will introduce slight write amplification.