The Direct-to-Data (D2D) Cache: Navigating the Cache Hierarchy with a Single Lookup
- Hide Paper Summary
Link: https://dl.acm.org/citation.cfm?id=2665694
Year: 2014
Keyword: D2D Cache; TLB
This paper presents an elegant solution for solving the problem of having to probe all levels of the cache when a miss to DRAM occurs. Essentially this is totally unnecessary and wastes cycles. The problem we have here is that L1 cache is not sufficient to tell whether a cache line will miss the next level, either L2 or L3. The status of miss or hit in lower levels will remain unknown before these caches are probed. To solve the problem, instead of having the processor probe L1, L2 and L3 caches in a row (which is unnecessarily serialized), the exact location of a line is stored in the TLB. Every time a virtual address is translated, in addition to finding the physical address associated with an entry, the TLB also returns the location of the line, including its cache level and set ID. Indices are always extracted from the virtual or physical page number depending on whether the cache is virtually or physically indexed.
Two extra components are added in D2D design. First, the TLB must be extended (called the “eTLB”) to contain location information of cache lines in the 4KB page. Two bits are needed to represent the cache identity, assuming three level of caches. The number of bits for set ID depends on the maximum associativity among all levels. The paper uses 4 bits to accommodate for the 16-way set associative L3. The second componeng is called a “Hub”, and it maintains the identity of all cached data in all levels.
The Hub is a physically indexed and physically tagged lookup structure private to processors. We maintain the invariant that if the information of the cache line is not in the Hub, then the corresponding cache location must also be evicted. The opposite may not be true, i.e. an entry merely for address translation can exist without any of the lines in the page be cached. Although the paper did not elaborate on the way the Hub is structured, attention should be paid because the Hub must be designed such that the cache is fully utilized.
Regular cache line tags in D2D design are removed, and replaced with a pointer to Hub entries. This also makes the tag array shorter, because pointer to the Hub is actually shorter than a tag. Each Hub entry also has a pointer to eTLB, indicating that the entry is cached by upper level TLBs. This pointer is set to none if eTLB does not have a copy. eTLB entry points to the cache entry using cache identity and set ID as previously stated. The cycle enables very flexible handling of events. No matter which component in the system generates an event that require all three components to collaborate, it can always be handled by traversing the cycle and synchronize them.
Cache coherence is handled by the Hub instead of the inclusive L3 cache. Because the cache address space is now flattened by the Hub, inclusiveness property is no longer needed. This also increases the available amount of storage to keep in the cache.
Cache line eviction causes the Hub to modify its entry. If the entry is present in eTLB, then eTLB is also updated to stay consistent. Then the cache line is evicted to lower levels in the hierarchy. The new cache line’s pointer must be updated to point to the correct entry in the Hub. In contrast, if a Hub entry must be evicted due to a translation miss, all cache lines whose location information is maintained by the Hub entry must also be evicted from the cache.