DEUCE: Write-Efficient Encryption for Non-Volatile Memories

- Hide Paper Summary

Paper Title: DEUCE: Write-Efficient Encryption for Non-Volatile Memories
Link: https://dl.acm.org/citation.cfm?id=2694387
Year: ASPLOS 2015
Keyword: NVM; Counter Mode Encryption; DEUCE

Back

This paper proposes DEUCE, a framework that provides cache line encryption capability to NVM via counter mode encryption. The paper identifies that in a standard counter mode encryption scheme, approximately half of the bits will be updated, resulting in more power consumption and wear. The problem can be solved if only updated data is encrypted, which affects a smaller set of bits per cache line. In addition, the paper also identifies that when performing writes, some bits are flipped more frequently than others, resulting in uneven wear within a cache line. Such uneven wear cannot be allievated by regular wear leveling schemes, since these scheme operate on cache line granularity (or larger). If a single bit in the cache line wears out, the entire storage unit for wear leveling will need to be marked as “bad”.

DEUCE is based on counter mode encryption, which protects the content stored on the NVM from being leaked to a third party. Counter mode encryption works as follows. At initialization time, every cache line is assigned a counter value. These counter values are stored in a separate region on the NVM, and must be kept consistent with the data area. When a line is evicted back to the NVM, the memory controller encrypts the cache line by first generating a one-time pad (OTP) of cache line size, and then XOR’ing the content of the line with the OTP. The generation of OPT takes the address of the line and the line’s counter value, and a secret key which is assumed to be stored securely, which are then sent to an AES encoder to generate the OTP. The property of AES guarantees that it is almost impossible to guess the value of the OTP without knowing the secret key, which makes the NVM device secure. Counters are incremented when they are used to generate the OTP, making sure that for each write, the value of the OTP is almost always different. The counter value on the NVM must also be updated when writing back the line, because otherwise it would be impossible to decrypt the line when it is fetched. The hardware must guarantee in some way that the counter and data are updated atomically. On reading a line from the NVM, the device generates the OTP in the same manner as in the case of a write. The generation of OTP can also be overlapped with NVM read, given that the counter is read first. Cache line data is decrypted by simply XOR’ing the line with the generated OTP.

The paper observes that even if only a single bit in the line is changed between a read and write (from/to the device), this encryption mode will in average flip half of the bits in the cache line (which is the exact purpose of encryption). Compared with writing back without encryption, this causes more power to be dedicated to changing the bit on physical hardware.

DEUCE reduces the number of bits that have to be flipped by using two counters, one trailing counter for encrypting data that has not been changed, and another leading counter for encrypting data that has been modified. The trailing counter can be derived from the leading counter, and changes rather infrequently. When a write back happens, we first check whether the trailing counter will change from the last time. If not, then only dirty data needs to be encrypted, since neither the tailing counter nor the address of the cache line will cause a change of the OTP.

DEUCE works as follows. Just like single counter encryption schemes, one counter per cache line is maintained as the metadata of the line. In addition, we also add a bit array to each line for recording dirty status in a smaller granularity the the entire cache line (e.g. 4 bytes). On every write operation on the line, the affected bits in the bit vector is set. When the line is to be written out, it does the following. First, the counter of the line is incremented. Then, we drive the leading counter and trailing counter described in the previous paragraph as follows. The leading counter is simply the current line counter. The trailing counter is derived by masking off the least significant K bits of the line counter. Masking off K bits means that the tailing counter will only change once for every 2^K writes. In the case that it does not change, the cache controller simply reads the dirty data bit vector, and encrypts dirty data only. The encrypted dirty data is then merged with the rest of the line, and written back to the NVM together with the updated counter value. If, on the other hand, the trailing counter value changes, the current non-dirty data must be re-encrypted, as we do not save the trailing counter, but only derive it from the line counter the next time the line is read. Not re-encrypting the cache line will lead to corrupted data, since when the line is read back in, the derived trailing counter would be different from the one used to encode it. The re-encryotion consists of one decryption and one encryption, which has longer latency. Fortunately, as the number of bits K increases, it becomes more and more infrequent for this to happen. In order for the decryptor to know which bytes to decrypt, when the cache line is written back, the bit vector is also persisted to the NVM atomically with the write. The next time this line is accessed, we read both the data and bit vector. The hardware decryptor generates two OTPs, one using the leading counter, and one using the trailing counter. The decryptor then uses the bit vector to decide which OTP should be used to decode which bytes, which is the exact reverse of data encryption.