COP: To Compress and Protect Main Memory



- Hide Paper Summary
Paper Title: COP: To Compress and Protect Main Memory
Link: https://dl.acm.org/doi/10.1145/2872887.2750377
Year: ISCA 2015
Keyword: COP; Memory Compression; ECC



Back

Questions

  1. I can’t see how this scheme provides SECDED. If two errors occur on the same compressed line and are on
    different segments, the number of valid code words will drop to two, which causes the line to be mistakenly recognized as uncompressed. This, however, should not silently fail, since SECDED code should report data corruption for two bit error.

This paper proposes COP, a memory ECC design which delivers protection without the extra hardware overhead. Conventional ECC-protected memory module often has an extra chip on a rank, enabling an extra 8 bytes of data to be read in parallel with 64 byte cache line data. Such dedicated ECC hardware not only cost more than a regular non-ECC memory, but also consumes substantially more power during operation, due to the extra read on the ECC chip. Prior publications also seek to implement hardware ECC check for non-ECC memory modules. A dedicated ECC region is allocated from the physical memory, which stores the ECC bits for each memory block in the rest of the address space. These schemes, however, suffer from various problems. First, the extra ECC region significantly reduces the amount of usable memory, which can take up to 12.5% of total storage. Second, even if ECC is only sparsely maintained as in some designs, the mapping structures needed for locating the ECC data given a line address is also a non-negligible cost in both performance and storage. The last problem is that ECC data is accessed for each memory request, which adds to DRAM latency, which degrades performance.

COP, on the other hand, conbines compression with ECC such that ECC is stored in-line with compressed data, given that the compression ratio is high enough to allow small ECC to be fitted in. By using compression, COP has three obvious advantages compared with previous schemes. First, no extra storage is required for maintaining ECC, since DRAM data is compressed in cache line granularity. Second, one DRAM access can fetch both data and ECC, which will not affect performance as in designs in which data and ECC accesses are serialized. The last advantage is that no extra indirections or mapping tables are maintained in order to access ECC, featuring zero metadata overhead.

We next discuss COP design as follows. The paper assumes a low-cost compression and decompression algorithm that can reduce the size of a 64 byte line by at least 34 bits. We will see below how these 34 bits are arranged within the line to provide single-erro correction and double-error detection (SECDEC) capabilities. COP does not pursue high compression ratio, and the design trades off compression ratio with possibilities of applying compression. In other works, compression algorithms that provide more than 34 bits of storage saving makes little sense in COP, but it is important that most cache lines should be compressible to at least 478 (512 - 34) bits to allow ECC in-lining. To this end, the paper suggests that multiple compression algorithms be attempted on a cache line evicted from the LLC. The one with the best outcome is selected as the compression algorithm for the specific line. A 2-bit compression type field is stored with the line as well, in order for the line to be decompressed. This 2-bit overhead is already included in the 34 bit overhead of ECC.

In the common case that a cache line can be compressed to less than 478 bits, the DRAM controller will compute ECC for each of the 15 bytes (120 bits) segment in the compressed block, padded with zero if the compressed size is less than 478 bits. Note that the 2 bit compression type field is also ECC-protected, summing to a total of 480 bits. These 480 bits are then divided into four equally sized segments, with 120 bit for each segment. This paper assumes (120, 128) ECC coding, in which each 120 bit segment can be protected by an 8 bit code with SECDEC property. Each ECC code is stored right after the 120 bit segment, forming a 128 bit code word. The four 128 bit code word are then written into the DRAM row, as if it is regular data.

In the less common case, the incoming cache line cannot be compressed to under 478 bits by any of the four compression schemes. The DRAM controller simply disables ECC on these lines, and just store them as-is, with the exception of code word conflicts, called aliases, as we will see below. One of the most important contribution of this paper is the identification of uncompressible lines, since COP is metadata-free, which implies no metadata could assist such identification. To solve issue, the paper proposes that hardware check the validity of the four code words. If more than two of them are valid code words, i.e. the ECC bits is the correct code for the rest of the code word, the block is treated as compressed, and will be sent to the decompression engine. On the other hand, if less than or equal to two code words are valid, the line is deemed as uncompressible, which will be read out as-is.

There are rare cases, however, that an uncompressible cache line actually contains more than two valid code words by coincidence. These lines are called “aliases”, and should never be written back to the DRAM. The paper suggests that the LLC controller should be responsible for detecting aliases, and avoid evicting them (although this may require adding extra ECC hardware to LLC controller, the paper seems not to mention this). The paper argues that this would be extremely rare, since only uncompressible lines will be considered, and aliases only happen when at least three conflicts occur. Given the probablity of a single conflict being 1/2^8 (for any 120 bit segment, assuming that the rest 8 bits are independent from the segment’s bit pattern, only one in all possible 2^8 values can be the valid ECC), and that conflicts are independent from each other, the chance that an alias would happen is lower than 0.00002%. In realistic workloads, however, bit patterns within and between segments are often not independent from each other. In the worst case, due to either malicious attack or some really unfortunate computation sequence, the content of segments can be highly correlated, which increases the chance of collision by several magnitudes. To deal with such pessimistic cases, the paper recommends that the DRAM controller first XOR the evicted block with a randomly generated string to break the potential correlation between bits, before compression is applied.

Recall that the SECDED code can detect at most two errors in the same block without silent data corruption. The scheme described above is, in fact, consistent with the SECDEC property. Given a compressible line, it will be mistakenly recognized as uncompressible only if at least two errors occur on different segments, reducing the number of valid code words to below three. Such corruption would be unable to be detected even if regular ECC technique were used. For uncompressed lines, since COP does not protect them from the beginning, only one error, though very rare, can transform it into a valid compressed block given that the uncompressible block already contains two valid code block, and that the error happens to transform a third block to be valid.