Data Compression Transformations for Dynamically Allocated Data Structures



- Hide Paper Summary
Paper Title: Data Compression Transformations for Dynamically Allocated Data Structures
Link: https://link.springer.com/chapter/10.1007/3-540-45937-5_4
Year: LNCS 2002
Keyword: Compression;



Back

Highlight:

  1. Only using simple compression algorithm, i.e., eliminating higher-order bits. This algorithm can be performed independently from other compressed data, and is therefore super light-weight, which is applicable for compiler instrumentation / L1 compression.

  2. Storage saving is achieved by merging two compressed words together into a 32-bit word. This is only performed selectingly on certain fields that are next to each other, which is different from block-oriented compression schemes where either an entire block is compressed, or none of the word is compressed.

  3. Uncompressable data is treated as an exception in a heap-allocated block, and stored with an extra level of indirection re-using the code word as a pointer to the overflow block.

  4. NULL pointer needs special treatment, since it it a very commonly seen pointer value, and yet does not share a prefix with heap pointers. The special treatment directly compresses NULL pointer to zero, which introduces aliases with all pointers with low 15 bits being zero. To avoid this problem, the allocator is prohibited to return an address with low 15-bits being zero.

Questions

  1. The paper proposes that pointer values can be compressed with a common prefix. But it never identified how the common prefix is maintained and stored, and what if the common prefix changes. Do we keep only one such prefix for all pointer values, or we can use multiple prefixes? Obviously the runtime should provide storage for keeping the prefix, such that the compression/decompression instructions have a way of using them in the instructions.

  2. The overflow block should also be freed when the object is freed. This requires the compiler to insert instructions to check the highest bit and call free(). This, however, is totally ignored by the paper.

  3. The bneh17 R1, R2, L1 instruction jumps to a function that allocates memory for the overflow block. This, however, does not sound right to me: (1) How does the function return the address of the allocated block to the caller? (2) How does the function know the address of the current compressed field (if the function performs value copy from the word to the overflow block)? (3) How does the function return to the caller? I guess you are not inserting one code chunk for each occurrance of the write operation?

This paper proposes Data Compression Extension (DCX) instruction set extension for performing simple data compression on heap-allocated data. Compared with conventional data compression approaches, where all cache lines in a certain address range or all objects are compressed without distinction, the DCX proposal allows software to select certain fields as compression candidates, and only compresses these fields in the run-time, avoiding unnecessary overhead of compression if applied to uncompressible data.

DCX is based on two observations. First, most pointer values to heap memory share common higher-order bits, since these memory blocks are typically created by an allocator that tend to group addresses that are close to each other for better access locality. Besides, some systems have a hard constraint on higher-order bits of pointers, further consolidating this observation. Second, most integers values used in real-world workloads do not exploit all bits in a 32-bit integer (the paper assumes 32-bit integers and pointers). These integers can be compressed with less number of bits, as long as the higher-order bits are identical to the sign bit after compression. The original value can be easily compressed restored without any external information just by eliminating and replicating the sign bit respectively.

The compression algorithm proposed by this paper is simple, which aims at compressing pointer values and integers into 15-bit words. If two adjacent fields are both compressed in an object, these two fields can be represented with one 32-bit integer word, with each of the half words storing a compressed result. The highest bit of the compressed word indicate whether the word stores two compressed words (set to “1”, if true). The object is always allocated with the compressed word, hence consuming less memory. If, in the run-time, one or both of the compressed fields become uncompressable, then both values will be decompressed, and stored in a extra block of memory consisting of only two fields. This block of memory will be linked to the main object by using the compressed word as a pointer to the block. In this case, the highest bit of the compressed word must be set to zero, indicating that the value should be used as a pointer to a block holding uncompressable words. The paper notes that on MIPS architecture, a pointer value always has its highest bit clear, therefore fulfilling this requirement automatically.

Initially, the compressed field is allocated as a single word in the object, and the highest bit is set to indicate that it contains compressed values. On write operations of any of the original value, compression will first be attempted on the value to be written using special instructions. If compression succeeds, the compressed value is written into the upper or lower half of the word by bit shifting and bit-wise OR’ing. Otherwise, the control flow is redirected to a piece of code that allocates a block of memory from the heap for the two uncompressed fields, decompress the old values from the compressed field, copy them into the overflow block, and then update the uncompressed value with the new value. Once the values are allocated with an extra level of indirection, any future update will not result in the deallocation of the overflow block. This is to avoid thrashing, which causes frequent allocation and deallocation.

The DCX proposal adds six RISC-style instructions into the ISA, three of them for pointer values, and the other three for integer values. For each given type, one instruction performs compression, while the other two performs decompression from upper and lower half-words of the compressed word, respectively. The pointer value compression instruction, bneh17 R1, R2, L1, compares the highest 17 bits of R1 and R2, and jumps to address label L1 if they are not equal. In practice, R2 should hold the pointer value to be compressed, and R1 holds the common prefix. L1 is supposed to be a function that allocates the overflow block, copy the existing values to the block, and then update the uncompressed value with value of R2 in the block. If compression succeeds, this instruction naturally falls through, and the compiler should insert instructions to shift R2 value into either lower or upper half word, and then OR the value into the compressed word.

Note that NULL pointer value requires special treatment, since NULL is an extremely common pointer value in nearly all real-world workloads, and yet it does not share a prefix with heap allocated pointers. To efficiently compress NULL pointers, the bneh17 instruction always asserts success regardless of the value in R1, if R2 stores NULL pointer. The resulting compressed word should also be all-zero. Storing all-zero as compressed NULL pointer, however, introduces aliasing problems, since a pointer whose lower 15 bits are also zero while sharing a 17-bit prefix with R1 is also compressed to an all-zero half-word. To solve this problem, the paper suggests that the memory allocator should avoid returning pointers whose lower 15 bits are all-zero.

The ineteger compression instruction, bneh18 R1, L1, simply checks whether the higher 18 bits of R1 are all identical, and jumps to L1 if otherwise. Note that integer value compression involves the high 18 bits, rather than 17 bits, unlike pointer compression, since integer compression uses the sign bit of the compressed half-word as the reference bit.

The pointer value decompression instruction from lower word, xtrhl R1, R2, R3, just copies the lower 15 bits in R3 and higher 17 bits in R2 into R1, restoring the original pointer value. The compiler is responsible for loading R2 with the prefix value before executing this instruction. Integer decompression and pointer decompression from the upper word works similarly.