STM in the Small: Trading Generality for Performance in Software Transactional Memory

- Hide Paper Summary

Paper Title: STM in the Small: Trading Generality for Performance in Software Transactional Memory
Link: https://dl.acm.org/citation.cfm?id=2168836.2168838
Year: EuroSys 2012
Keyword: SpecTM; STM

- Hide HTM Summary

Conflict Detection: Adaptive
Conflict Resolution: Adaptive
Version Management: N/A
R/W Bookkeeping: N/A

Back

Software Transactional Memory (STM) has long been rumored as being slow and useless as only a “research toy”. The slowness of STM for general purpose code inhibits its ability to serve as a building block for larger systems such as efficient parallel data structures and database transaction processing engine. The major factors that make STM slow can be summarized as follows. First, unlike Hardware Transactional Memory (HTM) where the data cache and on-chip buffers serve as a limited storage of transaction metadata, STM must keep all metadata by itself with pure software approach. To work with potentialy unbounded transactions, STM must support metadata of arbitrary size, which usually requires an auxiliary lookup structure such as a hash table or per-item annotation for every transactional object. Neither of these is fast and scalable. Second, STM essentially creates a separate name space for contents of the main memory. Transactions bring in data items into the separate name space on read, and buffers dirty data in the name space on write. The STM implementation must ensure proper synchronization between the private name space and the main memory name space. For example, most STM systems guarantee that if a memory word is written by a transaction, read operations on the word performed by the same transaction must observe the value it has written, rather than committed values currently residing in the main memory. This feature requires the STM run-time to check the read address in the current write set, which can be time consuming if performed for every read. Third, special cases and patterns, such as read-only, read-one update-many, compare-many swap-one, or all-read-all-write, are typically not the focus of optimization in existing STM systems (although some might have optimizations for read-only transaction because it is straightforward). These special patterns, however, are common in data structure implementations. Not being able to take advantage of the extra semantics information offered by the special patterns can make STM users pay extra overhead they do not have to. All of these three problems contribute to the reason that STM may not be a practical approach for efficient parallel programming.

This paper proposes SpecTM, an STM design that addresses all above three problems by positioning itself properly between the simplest Compare-and-Swap (CAS) paradigm and complicated fully-fledged STM systems. The philosophy of SpecTM is to find a balancing point between user’s resonsibility and STM’s responsibility, such that the STM only controls how transactions are serialized, and the rest is shifted to the user. To achieve its design goal, SpecTM optimizes handling of short transactions, special data types, and special access patterns. We summarize the design decisions featured by SpecTM as follows. First, SpecTM does not keep read log and write log of arbitrary size. Instead, read logs are kept only for a constant number of transactional reads, the sequence number of which is specified by users using different API calls. Fixed-sized read log makes it possible to allocate the transaction object as a stack variable with the transaction’s read set also stored on the stack, without an extra layer of indirection. Write logs are entirely eliminated, and SpecTM does not allow users to perform transactional writes until commit time. The commit API of SpecTM takes the addresses and values of transactional write operations as arguments. Second, SpecTM continues to use timestamps and lock bits as per-item metadata, but they are crafted in a careful way such that the metadata and actual data item are stored in the same cache line, and hence can be fetched with at most one cache miss. To further reduce the metadata overhead, SpecTM specializes for data types such as aligned pointers and small integers by “borrowing” one bit from the data item and using it as the lock bit. The timestamp in this case is also eliminated from the design, and validation of reads proceeds by directly comparing values of data items. Third, SpecTM optimizes special patterns in different ways. For example, the base STM that SpecTM is built on makes use of incremental validation and adjustment of the begin timestamp. For every read operation where the commit timestamp of the data item is greater than the begin timestamp, a validation is triggered, and if it is successful, the begin timestamp of the current transaction is adjusted to be the current logical time. For read-only transactions, if the read set does not change after the last incremental validation, SpecTM does not require the transaction to perform validation at commit time. Instead, read-only transactions can directly commit. Another example is for value validation. SpecTM does not require transactions to validate data items if: (1) the transactions only reads one data item, in which case the validation only consists of one value-based comparison without worrying about reading an inconsistent snapshot; (2) the transaction writes into every item it has read. In this case the read set is locked, and hence value-based validation can always read consistent snapshot; (3) All transactions in the system write distinct values into data items. In this case value-based validation is equivalent to timestamp based validation. This case is not too uncommon, because some memory reclamation algorithms actually can meet this requirement. By performing value validation when it can, SpecTM avoids metadata overhead on data items and the validation can complete faster. The last feature that SpecTM has is the fall back path using a normal STM which is a derivation of TL2 with begin timestamp adjustment. The normal STM suffers from performance problems mentioned earlier, but it can commit transactions of arbitraty size and control flow. The normal STM and SpecTM must be able to cooperate and observe each other’s committed states. Although the paper does not mention how this is achieved, it is suggested to be a combination of per-item metadata and value validation.