Adaptive Line Placement with Set Balancing Cache



- Hide Paper Summary
Paper Title: Adaptive Line Placement with Set Balancing Cache
Link: https://dl.acm.org/doi/10.1145/1669112.1669178
Year: MICRO 2009
Keyword: Cache; SBC; Set Balancing



Back

This paper proposes Set Balancing Cache (SBC), a technique that conditionally merges sets within a cache for better overall performance. The paper begins by pointing out that in current set-associative designs, sets are often not accessed with uniform probablity. In other words, some sets in the cache will receive significantly more activity than the rest. Unfortunately, in set-associative caches, tags and data slots are statically allocated for each set, and there is no way to allieviate such imbalance by moving tags and data slots around.

This paper seeks a new approach of balancing traffic between sets without introducing an extra level of indirection. In a conventional cache, similar effect can be achieved by doubling the associativity of the cache while keeping total number of data slots unchanged. This approach, however, has two major flaws. The first flaw is that for each cache access, no matter a hit or not, the number of tags that have to be read from the SRAM file also doubles, which consumes 2x the power than it used to. The second flaw is that by doubling the number of ways, we effectively merge two sets in the previous cache whose index only differs by the highest bit, i.e. one less bit is used to index the set in the new cache, and one more lower bit is used in the address tag. Such static mapping does not always fulfill our needs, since both sets can be busy, and merging them together does not really help.

This paper proposes two set balancing models. The first model, Static Set Balancing Cache (SSBC), is similar to doubling the number of ways in terms of the static merging scheme, but still, it does not require wider tag read circuit. The second model, Dynamic Set Balancing Cache (DSBC), dynamically matches an under-utilized set with an over-utilized set, which enables potentially better resource scheduling.

Both models rely on the concept of saturation level of a set, which is measured by the number of cache misses observed by the set. The intuition is that if a set has seen many cache misses, accesses on the set must be of low locality, and providing more ways in this case can help performance. On the other hand, if a set barely sees any miss, then either locality on the set is good, or few accesses actually hit the set. In both cases, some ways of the set can be spared for use by another over-utilized set by evicting existing lines at the end of the LRU stack.

The paper proposes using a saturating counter to measure the saturation level of a set. Given an N way cache, the saturation counter only needs to support a maximum value of (2N - 1) (so it is still a power of two). The per-set counter is incremented for each miss in the set, and decremented for each hit. A value of (2N - 1) indicates that the current set is over-utlized, and should borrow a few ways from an under-utilized set. A value below N, on the other hand, indicates that the way is under-utilized, and can spare a few ways to other sets.

Static Set Balancing Cache works by always merging sets whose index only differ by the highest bit. This results in the same design as doubling the number of tags, but does not incur wider tag reads per access. Each tag in the cache is added with a “d” bit to signal whether the line was migrated from another set, or it is local. Each set also has a “sc” bit indicating whether the set has migrated any lines to another set. The “sc” bit is set on the first line migration, and cleared by the cache controller when the other set has evicted all lines with “d” bit set. This can be detected using a per-set OR gate with the “d” bit from the tag.

On a cache access, the controller first computes the regular set index, and probes the set as usual. If there is a miss, then the “sc” bit is checked. If “sc” bit is 1, indicating one or more migrated line, the other set is also probed. If the result is still a miss, then a miss is signaled. Otherwise, a hit is signaled. Note that the line being hit is not miraged back to reduce design complexity.

On an eviction, the cache controller first checks whether the current cache set and the other set is above and below the migration threshold. If both requirements are met, the evicted line is migrated to the other set without being written back to the lower level cache. The migrated line is inserted to the MRU location of the destination set, while the LRU line is evicted. There are two reasons for this. First, by inserting into the MRU location, we can ensure that the line can at least be used a few times before it is evicted. Second, if we insert the line into LRU location, then a second migration will evict the first line, which makes no sense, since this just unnecessarily capped maximum line migration to one.

Dynamic Set Balancing Cache does not force static migration between sets. Any set can migrate to any other set. In the paper, the over-utilized set is called the source, and the under-utilized set is called destination. Theoratically speaking, shifting from static to dynamic scheme would be as easy as comparing all sets’ saturating counter with the threshold, when the current set is already saturated. This, however, requires a large associative comparator for every cache access, which offsets the purpose of reducing associativity.

The paper proposes using a small cache to track cache lines that are below the threshold. An associative buffer is added to the cache, which records cache set indices whose saturation level is below the threshold N, where N is the number of ways in the cache. Two comparator trees compute the minimum element and the maximum element (and their indices) in the array respectively. The minimum element is used when a set is needed as the destination for migration, while the maximum element is needed to minitor the high-water mark in the saturation cache. Whenever a set not in the cache whose saturation counter drops the maximum, the maximum element is removed from the saturation cache, and the set with a lower value is added. Note that the saturation cache only stores sets whose saturation counter is below the threshold. If a set’s counter exceeds this threshold, it will be removed from the cache, and the corresponding slot is reset, such that the value is set to the threshold to make sure they will always be prioritized over other valid lines. The cache is accessed for each saturation counter value change.

The access and eviction protocol barely change for dynamic set balanced cache scheme, except that each set can now become a source or destination of another set. To this end, the paper proposes adding a per-set “s/d” bit, where a “0” bit indicates the set is destination, and “1” indicates source. In addition, a per-set migration destination index is also added to explicitly record the destination. If the “s/d” bit is 1, and an access just misses in the set, the destination set is checked using the same address.