Multilevel Organization of Cache Memory
- A multilevel cache hierarchy consists of n levels of caches. C1, C2, .......,Ci,....... Cn.
- A processor reference is serviced by the cache closest to the processor that contains the data.
- At the same time that cache provides information to the caches on the path between itself and the processor.
- Multilevel cache hierarchy for multi- processors neither a local LRU nor a global LRU.
- Where all references to a Ci cache are percolated to its parent for rear-ranging the LRU stack at the Ci+i level.
Multilevel caches
- Another issue is the fundamental tradeoff between cache latency and hit rate.
- Larger caches have better hit rates but longer latency.
- To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger, slower caches.
- Multi-level caches generally operate by checking the smallest level 1 (L1) cache first.
- If it hits, the processor proceeds at high speed.
- If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.
- With increased logic density caches can be on same chip as processor.
- Reduces external bus activity and speeds up execution times.
- Most contemporary computers have at least 2 levels.
- Internal: Level 1 (L1).
- External: Level 2 (L2).
Small, fast Level 1 (L1) cache
- Often on-chip for speed and bandwidth
- Often on-chip for speed and bandwidth
Multilevel Cache |
Larger, slower Level 2 (L2) cache
- Closely coupled to CPU;
- may be on-chip, or “nearby” on module
L2 and L3 Cache
- Performance improvements depend on hit rates
- Complicates replacement algorithms and write policy.
- With L2 cache on-board L3 cache can improve performance just as L2 can improve over L1 alone.
Unified Caches
- Higher rate for given cache size because cache is automatically balanced between instructions and data.
- Only one cache needs to implemented.
Split Caches
- Split caches have separate caches for instructions and data.
- These tend to be stored in different areas of memory.
- Current trend favors split caches.
- Useful for superscalar machines with parallel execution of instructions and prefetching of predicted instructions.
- Split cache eliminates contention for cache between instruction fetch/decode unit and the execution unit (when accessing data).
- Helps to keep pipeline full because the EU will block the fetch/decode unit otherwise.
For example,
- The IBM POWER4 (2001) had off-chip L3 caches of 32 MB per processor, shared among several processors;
- The Itanium 2 (2003) had a 6 MB unified level 3 (L3) cache on-die;
- The AMD Phenom II (2008) has up to 6 MB on-die unified L3 cache;
The benefits of an L3 cache depend on the application's access patterns.
Comments
Post a Comment