Multilevel Organization of Cache Memory

A multilevel cache hierarchy consists of n levels of caches. C1, C2, .......,Ci,....... Cn.
A processor reference is serviced by the cache closest to the processor that contains the data.
At the same time that cache provides information to the caches on the path between itself and the processor.
Multilevel cache hierarchy for multi- processors neither a local LRU nor a global LRU.
Where all references to a Ci cache are percolated to its parent for rear-ranging the LRU stack at the Ci+i level.

Another issue is the fundamental tradeoff between cache latency and hit rate.
Larger caches have better hit rates but longer latency.
To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger, slower caches.
Multi-level caches generally operate by checking the smallest level 1 (L1) cache first.
If it hits, the processor proceeds at high speed.
If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.
With increased logic density caches can be on same chip as processor.
Reduces external bus activity and speeds up execution times.
Most contemporary computers have at least 2 levels.

Performance improvements depend on hit rates
Complicates replacement algorithms and write policy.
With L2 cache on-board L3 cache can improve performance just as L2 can improve over L1 alone.

Higher rate for given cache size because cache is automatically balanced between instructions and data.
Only one cache needs to implemented.

Useful for superscalar machines with parallel execution of instructions and prefetching of predicted instructions.
Split cache eliminates contention for cache between instruction fetch/decode unit and the execution unit (when accessing data).
Helps to keep pipeline full because the EU will block the fetch/decode unit otherwise.

For example,

The IBM POWER4 (2001) had off-chip L3 caches of 32 MB per processor, shared among several processors;
The Itanium 2 (2003) had a 6 MB unified level 3 (L3) cache on-die;
The AMD Phenom II (2008) has up to 6 MB on-die unified L3 cache;

The benefits of an L3 cache depend on the application's access patterns.

University Study Materials 4 U