- Hide Paper Summary
Year: FAST 2020 Preprint
Keyword: NVM; Vilamb; Checksum; Redundancy
This paper proposes Vilamb, a software solution to redundancy problems of NVM DAX storage. Redundancy here refers to the mechanism for ensuring that data corruption caused by firmware bugs or random bit flips can be detected or even fixed before causing bigger troubles. In conventional file systems, redundancy information of pages can be verified and updated when pages are fetched into or evicted from the page buffer, which is controlled by the file system. For NVDIMM DAX mapped pages, however, the OS has no control over the fetch and eviction of cache line sized blocks, since the cache controller determines when a block enters and leaves the cache hierarchy.
Previous researches have proposed solutions that integrate the redundancy controller into the last level cache (LCC). Blocks fetched from the NVM are verified for integrity using a page-level or cache line-level checksum stored elsewhere. Evicted dirty blocks cause the checksums and parities to be updated using the diff between the dirty and the old content. The hardware solution, however, requires long and tedious verification at IC level, and is unlikely to be adopted by commercial serves in the near future.
Vilamb is a software solution to redundancy problems that features easy-to-use interface and non-intrusive detection mechanism. Previous software solutions often require special interface for intercepting memory accesses, which are otherwise transparent to the library, and to mark the begin and the end of the redundancy-protected region. For example, Palingon, a software library for writing redundancy-protected software, mandates that programmers should use its transactional interface and decorate each load and store with a library wrapper. This requires re-compiling the application, which is sometimes even not easily achievable. Besides, Palingon verifies and updates redundancy information for a page as soon as it is read or written by a memory instruction. This is of course necessary when we want a hundred percent detection guarantee of data corruption, but given that actual hardware failures are relatively rare event, we may want to loose this guarantee a little, which can result in better performance.
Vilamb starts two background threads for redundancy verification and update respectively. The verification threads scrubbles the NVM device by reading clean blocks in the DAX address space, computing their checksums, and comparing the checksum with the one stored in a separate device. If the two checksums mismatch, then the thread has found a potential corruption, which is then reported to the OS. Note that only clean pages are verified, since Vilamb tolerates temporary inconsistency between page content and checksum for dirty pages.
The updating thread updates checksums in the background without affecting normal loads and stores. The checksum updating process in Vilamb leverages two important observations. First, updating the page level checksum is sub-optimal due to the read amplification effect is causes, as the entire page must be read before the new checksum is computed. On the contrary, Vilamb amortizes the cost of reading the entire page over several small updates to the page. The background thread only checks the dirty status of a page periodically instead of on every access. Due to access locality of most pages, it is likely that more than one writes have updated the page, making the cost of computing page checksum relatively smaller over these writes. The second ovservation is that the “dirty” bit in the page table is not being used by the OS if the page is mapped to a physical page, but will be updated by MMU when a store instruction accesses the page anyway. In the case of DAX, since all virtual page frames are mapped to a physical page on NVDIMM, the dirty bit in page table entries for this address range is largely not used. Vilamb takes advantage of these dirty bits to identify pages that have been modified since the last scan, as we will discuss in the following.
At the beginning of each scan, Vilamb first scans the page table entries for pages mapped to the DAX region, and notes down all pages with their dirty bit set. Then, for each dirty page found, it first makes a persistent copy of the dirty bit on NVM to notify the background thread that the current page is potentially inconsistent with the checksum. Then it clears the dirty bit, and reads the page content to compute the checksum. A racing store, in the meantime, may set the dirty bit of the page, but Vilamb is good with these stores, since dirty pages are assumed to be inconsistent with the checksum. The persistent dirty bit is cleared only after we completed the checksum update. A background scrubbling thread should not verify the page is either the page table bit or the “shadow” dirty bit is set. Note that we must clear the dirty bit before starting to compute checksum, since otherwise a racing store may be ignored by resetting the dirty bit at the end. We should also keep a persistent copy of the dirty bit, to avoid the scrubbling thread treating the page whose checksum has not been updated (but dirty bit has been reset) as clean, and raises spurious faults.
Vilamb’s asynchronous update scheme opens a window of vulnerability in which errors can be missed. For example, assuming a store instruction is errorneously discarded by the firmware. This will not be identified by the background scrubbling thread, since it only checks clean pages. Later on, when the updating thread refreshes checksum values for dirty pages, the page’s checksum will be updated using the current content of the page, potentially reading those not already cached from the NVM device. In this case, the problem will remain undetected, since the checksum is computed using already corrupted data.