Mostly Order Preserving Dictionaries
- Hide Paper Summary
Link: https://ieeexplore.ieee.org/document/8731521
Year: ICDE 2019
Keyword: Compression; Database Compression; MOP; Order-Preserving Dictionary
Highlight:
-
Using multiple independent order preserving small dictionaries and a on-ordered dictionary to approximate a full, monolithic order preserving dictionary. The novolty lies in the design decision that a range query can be decomposed into several smaller queries on a subset of the table, based on the value partition implied by the dictionaries. Order preserving dictionaries require very little work, while the last non-ordered dictionary requires decoding values stored in the table.
-
Using floating point dictionary for sorting. This works because dictionaries are expected to be much smaller than the table (otherwise compression does not work). The time complexity of this is then only linear to the size of the dictionary, rather than the size of the table, which is faster than decoding all values and then sort them.
Questions
- Although easy to infer, the paper should explicitly mention that a decompression dictionary should be maintained in parallel with the compression dictionary.
This paper proposes mostly order preserving dictionary (MOP), a database compression technique for fast dictionary encoding and mostly decoding-free range query. MOP is motived by order preserving dictionaries, which is one of the order preserving techniques that map large, repeated values to smaller code words, reducing the number of bits required to represent the value. The mapping relation is stored as a seperate dictionary for further encoding or decoding.
Before MOP, previous work has proposed order preserving dictionary designs to achieve both compression and decoding-free query execution. An order preserving dictionary delivers the guarantee that the numeric order of encoded values is consistent with the total ordering of unencoded values. This property optimizes range queries based on field values, since the query does not require decoding the compressed field in order to select the correct tuples. In this case, the query engine rewrites the condition using the encoded values whose range covers exactly what is requested by the original condition.
Full order preseving dictionaries are not always practical. There are two real-world restrictions that can hinder its adoption. The first is that order preserving dictionary requires knowing the entire data set, or at least the distribution of the field values, before generating the dictionary. This poses a challenge for real-time data analytical systems, as they must digest information when they arrive, without too much buffering capability to gather information about the full data set. The second problem is that even if the full data set is available, buidling the dictionary involves scanning and generating an ordered set of possible values, which is resource consuming.
MOP avoids the above two challenges by allowing unordered code word assignments to occur at the end of the code word value domain, partially violating the ordered property that the order of code words must match those of the input value. Such violation, however, only marginally affects query performance, as the query executor may run a several-pass scan on the queried table, each only selecting tuples using part of the dictionary, and combine results later.
A MOP dictionary consists of multiple sections. Let the number be N, each the first (N - 1) sections contain order preserving codes, but the ordering between sections are not guaranteed. In other words, in each of the first (N - 1) sections, the ordering of code words fully matches the ordering of the actual values they encode. In the last section, called the DIS section, however, there is zero ordering guarantee, and the ordering between code words can be arbitrary. The sectioned design is a result of incremental dictionary generation. Since the direction generator can only see partial inputs at any stage of the execution, it can only make the best effort to assign code words to input values in the current working section, leaving gaps for future insertions. When this becomes impossible, the dictionary generator starts a new section, and use it as well as all previous sections as the current working section. The last section is treated as an “overflow” section, where values are simply inserted in FIFO order, being assigned monotonically increasing code words.
The dictionary is logically organized as an array, with values stored in a slot as the input value, and the index of the slot as the output code word for compression. Decompression is performed with an inversed dictionary, which is maintained in parallel with the compression dictionary. The paper does not specify the decompression dictionary, though.
The MOP dictionary is generated as follows. In the first stage, all worker threads cooperatively read a prefix of the input stream, and estimates the cardinality of the input based on the sampled results. Cardinality is estimated by counting the number of distinct values in the samples. In the second stage, the cardinality estimation is reported to the generator, and the generator allocates the first section based on the reported number. The paper defines a configurable value, the “pitch”, as the number of extra slots that will be reserved in the first section in order to handle values not occuring in the samples. The actual size of the first section is the cardinality multiplied by the pitch (which is always greater than one). Then worker threads start process the input stream in real-time, and send the values to the dictionary. On receiving the value, the dictionary performs a binary search, and locates the position that the value should be inserted. The value should be inserted to the position where the previous value is smaller and the next value is larger. If there are more than one free slots between the two values, the generator picks the middle slot, assuming that the probablity that future values lying in the two resulting gaps are equal. In fact, the paper suggests that worker threads should aggregate a batch of values before sending them to the dictionary generator. This way, the generator inserts the values that fall into the same gap evenly over the gap, which distributes values better than the single-value approach.
When a gap can no longer be found, the dictionary generator must allocate a new section at the end of the current dictionary. The size of the new section is either 2x or one fifth the size of the previous one, depdending the estimated cardinality and the actual cardinality till this point. If the former is higher, meaning that we gave a good approximation, the next section need not be large, since not many distinct values are expected to appear. On the other hand, if the latter is bigger, indicating that the estimation severely under-estimates the cardinality, the next section should be larger than the previous one to absorb more values before it overflows.
When the number of sections reach an upper bound, the dictionary stops growing, and will simply have all incoming values appended to the end of the array, forming the last DIS section, which is not order preserving. One critical obsevation is that the output code word of values in the DIS section is larger than all code word values in the previous sections. Each section, no matter DIS or not, will also maintain a value range which is the tight bound of all input values stored in the section. Note that value ranges of different sections will likely overlap with each other, since code words in different sections are not guaranteed to be order preserving.
With MOP, point queries can still be executed by rewriting the point value using the code word translated by the compression table. If the value does not exist, the query returns empty. Otherwise, it just uses the code word without decompressing any of the tuples. For range queries, however, the bound of the range cannot be directly overwritten, since there are several sections plus the DIS section. To address this challenge, the paper proposes that range queries can be executed in a few iterations, one for each section. In each of the iterations, the range query is only executed for the code words mapped by the corresponding section. Since code words never overlap, i.e., all valid code words are always mapped by exactly one section, the result of each iteration can therefore be concatenated together as the final result.
We next describe range query operations on individual sections. If the section is ordered, then range query is no more than translating the bound of uncompressed values to encoded code words. Due to the ordered property, the translation can be done with two binary sorts on the array. The query bound is rewritten with the index of slots that the binary search terminates.
Queries on the DIS section is more complicated. The executor first compares the range of values in the original query with the value range of the DIS section. If these two do not overlap, then the section can be skipped. If the former is a superset of the latter, meaning that all values mapped by the DIS section is included, all tuples in the DIS section will be selected. In the final case, there is a partial overlapping, or the former is a subset of the latter. In this case, the executor scans the table, and for each tuple mapped by the DIS section, indicated by a code word value larger than the first code word value of the DIS section, decode the tuple, and performs a range test on the original range. Only those that fall into the given range is selected.
Sort operations cannot be executed directly on the code words with the DIS section. To solve this problem, the paper proposes that a temporary order preserving dictionary be built on-the-fly. The new dictionary only requires re-assigning code word values of inputs in the DIS section. This is done by using floating point numbers, the gap of which can theoritically be infinitely sub-divided. All values in the non-DIS sections are assigned the same value in floating point numbers as the current value. Then DIS values are inserted into the order preserving gaps between these values, being assigned fraction values computed by taking the average of its two neighbors. Once this is completed, a mapping relation between the canonical dictionary and the temporary dictionary is generated. The sorting is performed on compressed code words, with the comparison operator using the floating point dictionary for value comparison. Since the floating point dictionary is order preserving, the results will be sorted properly.