BCC: Reducing False Aborts in Optimistic Concurrency Control with Low Cost for In-Memory Databases
- Hide Paper Summary
Link: https://dl.acm.org/citation.cfm?id=2904126
Year: VLDB Jan 2016
Keyword: BCC; OCC
Optimistic Concurrency Control (OCC) is favored on modern hardware over traditional Tow-Phase Locking (2PL) for its high degree of parallelism and lower communication overhead between processors. OCC features a three phased transaction execution. In the first phase, the read phase, transactions optimistically read data items without global state modification, and buffer writes in their local storage. Buffered write operations are called “pre-write”, and these pre-writes will not be made public until the later commit phase. Transactions may read inconsistent data items should other transactions commit during their read phases. The original OCC algorithm does not try to address this problem, and implicitly assumes that inconsistent data will only affect the write set, but never the control flow. After the read phase, transactions enter the second phase, the validation phase. Before entering the validation phase, transactions acquire a global commit mutex to serialize commit operations. The actual serialization order is the order that transactions acquire this commit mutex. During the validation phase, transactions compare their read sets with write sets of transactions that have committed while it is reading. This is achieved via a global timestamp counter. Transactions read the counter at the beginning of transaction, and fetch-and-increment the counter after acquiring the commit mutex as its commit timestamp. Transactions whose commit timestamp is between this range are possibly writing into the read set of the committing transaction, and are hence subject to serializability checking. We call these transactions the “overlapping transactions” in the following text. The serializability checking is performed by intersecting the read set of the committing transaction with the write sets of overlapping transactions. This requires O(n) sets operations where n is the number of overlapping transactions. If any of the intersection returns non-empty result, a Write-After-Read (WAR) violation has happened, and the committing transaction must abort. The WAR dependency is considered a violation here, because it disagrees with the serialization order established by acquiring the commit mutex. If transaction T1 writes onto the read set of transaction T2, then the dependency dictates that T1 should be serialized after T2. Since OCC assumes that the direction of dependencies must agree with the direction of commit order, it is implied that T1 should commit after T2. In the above example however, if T1 commits before T2, then the commit order is violated, and T2 must abort. On the other hand, if the validating transaction passes validation, then it enters the last phase, the write phase, in which all locallu buffered writes are flushed to the shared storage. The write set of the transaction is also archived with the commit timestamp of the transaction. Later transactions will use this archived write set to perform read validation. The write set can only be garbage collected after all concurrenct reading transactions have exited. After the write phase, the commit mutex is also released.
OCC increases the degree of parallelism and reduces the amount of communication between processors by reading optimistically without locking in the read phase. There are, however, still problems that can plunge OCC’s performance. The first problem is that OCC still validates pessimistically despite the optimistic read phase. Transactions in OCC will eventually abort if they cannot commit/will be blocked by 2PL. The second problem is that OCC may introduce many false aborts if contention is high. For example, OCC always aborts a transaction if its read set is overwritten by another transaction during the read phase. This, however, is only a necesssary condition for dependency cycles, but not sufficient. When the degree of contention is high, it is expected that many WAR dependency will be identified as potential sources of violations, and OCC will suffer from performance degradation.
In order to increase the robustness of OCC under high contention, as well as to improve OCC’s flexibility, this paper proposes Balanced Concurrency Control (BCC) as an alternative of OCC under high contention. BCC extends the execution model of OCC by allowing transactions having WAR dependencies to commit under certain circumstances. Accordingly, the overhead of dependency management will also increase as the concurrency control manager becomes more complicated. The paper strives to find a balance point between CC overhead and efficiency by exploring easy-to-identify situations that a transaction must be aborted in the presence of WAR violations. We cover the design of the system in detail in the next few paragraphs.
As stated in the previous paragraph, OCC aborts transactions if a WAR dependency is detected at commit time. WAR dependency alone, however, are not sufficient to form a dependency cycle which is the sufficient and necessary condition of non-serializable execution. In fact, we can identify a fixed pattern in an OCC dependency cycle. Recall that in the execution model of OCC, WAW and RAW dependency both imply that the source transaction commits before the destination transaction, as transaction’s write set is only made public on commit. WAR is the only exception that allows the destination transaction to commit before the source transaction. Now, imagine there is a dependency cycle. We assume that T3 is the transaction that committed the earlist in real-time, and that the cycle has at least three transactions (the case of two transactions is trivial). Since T3 is part of a cycle, there must be two other transactions, T1 and T2, such that T1 → T2 and T2 → T3. Note that T3 is the earlist committed transaction, but it is serialized after T2. We know the dependency between T2 and T3 must be WAR, which is excatly what OCC algorithms can detect. The dependency T1 → T2 can be any of the RAW, WAR and WAW, because the only constraint is that T1 commits after T3. We call the structure T1 → T2 → T3 where T2 → T3 is WAR a “dangerous” structure (“essential structure” in the paper). BCC recognizes this dangerous structure at runtime dynamically, and abort transactions if necessary to avoid dependency cycles. We argue that BCC aborts only a subset of transactions that OCC aborts, and hence exposes better parallelism than OCC under high contention. Proving the argument is not difficult. On one direction, we can see that there is a WAR dependency in the dangerous structure, and this is what OCC is trying to avoid. This proves that every transaction that will be aborted by BCC will also be aborted by OCC. On the other direction, we can construct an instance of T1 → T2 → T3, where T2 → T3 is WAR but T1 commits before T2 starts, therefore not constituting any cycle. In this case BCC does not abort T2, which does not hold for OCC.
Taking advantage of the observation, BCC does the following check during the validation phase of any transaction: (1) Whether the current read set is written by another committed transaction. The WAR check is also done by OCC; (2) Whether the current transaction is the destination of a dependency if it commits. The source of the dependency is one or more concurrent transactions that commit after the current transaction begins. Note that condition (2) is slightly stronger in order to detect the dangerous structure, as the structure only requires that T1 commits after T3. Since T3 must commit after T2 begins, condition (2) includes all dangerous structures, but also introduces false positives.
The implementation of BCC is similar to Silo. Each data item (tuple) is accompanied by a timestamp, or TID, which indicates the TID of the most recent transaction that writes to it. The TID consists of three fields: global epoch, local counter and processor ID, from highest bit to the lowest bit. The processor ID encodes the identity of the processor that allocates the TID. In BCC, transactions are dispatched to processors for execution, and does not migrate. At any given time, there is only one transaction executing on a processor. The global epoch is a global counter that is advanced slowly by a background thread. In Silo, the interval is set to 40 ms. Transactions read the global epoch before they commit. This does not cause much contention, as the majority of reads should be satisfied by the local cache. Each processor also has a local commit ID counter, which is incremented for every committed transaction. TID allocations happens when transactions enter validation phase. The thread that executes the transaction reads the global epoch, local counter, and processor ID, before concatenating them together. The local counter is incremented after every allocation. The most recent allocated TID is also stored in a globally accessible location as LAST_TID. Since each thread only executes one transaction at a time, the LAST_TID of a processor is either the TID of the current executing transaction, or the most recent committed transaction. BCC makes use of this invariant to determine concurrent transactions of the committing transaction.
Silo-style TID allocation has two distinguishing features. First, the TID allocated by all threads are globally unique due to the processor ID field. Second, the numerical values of TIDs allocated by one thread are monotonically increasing, as the epoch and local counter are both monotonically increasing, and the local counter is reset on each global epoch change. The second property is crucial for determining the transaction or transactions that have executed on a certain processor. If the value of a TID is numerically smaller than LAST_TID, then we know the corresponding transaction must have committed.
In order for transactions to perform backward validation with concurrent and committed transactions, transactions must expose their read sets since transaction begin. BCC transactions are allocated a TID on transaction begin. Note that this is different from most lazy OCC algorithms where the TID is allocated only after the read phase. For OCC algorithms, allocating TID at transaction begin is undesirable, because transactions are forced to commit in the same order. In BCC, this is no longer an issue, because BCC does not assume that the order of serialization is the order that TIDs are acquired. After transaction begin, the transaction inserts every data item it reads into a hash table, which is tagged with the TID of the transaction and is publicly accessible. The implementation of the hash table is assumed to be multicore-efficient and linearizable. Race condition can happen if a validating transaction probes the existence of a data item while a reading transaction is inserting. It is possible that the WAR dependency is not identified, but the reading transaction fails to read updated value. The simple solution would be to just lock the hash table while it is probed and inserted into. To further reduce the locking overhead, the reading transaction samples the data item before and after inserting the item into the hash table. If two samples disagree, then an update must have happened in-between, and the read must be retried before deleting the item from the hash table.
During validation, BCC first acquires a global commit mutex as in classical OCC. Then it validates the current transaction not only for WAR dependencies, which can be done by comparing the TID of items in its read set with its own TID, but also checks RAW, WAW and WAR dependencies with concurrent transactions. Concurrent transactions can be identified by reading the LAST_TID field of all processors at transaction begin and validation. The validating transaction then scans the read set hash tables, and checks each whose TID is between the two LAST_TID samples on the corresponding processor (recall that we could find the processor ID from the TID). If any of its write set element is found in the read set hash table, then a WAR has been detected, and validation fails. Similarly, the validating transaction needs to detect RAW and WAW dependencies from concurrent transactions. This can be achieved by comparing the TID of tuples in its read set and write set respectively. If any of them is greater than the begin timestamp, then RAW and/or WAW is recognized, and the validation fails. If validation passes successfully, the transaction enters write phase. During the write phase, it flushes all elements in the write set back to shared storage, and updates their tuple TIDs as well. The transaction releases the commit mutex after the write phase.