Multilevel Organization of Cache Memory

  • A multilevel cache hierarchy consists of n levels of caches. C1, C2, .......,Ci,....... Cn.
  • A processor reference is serviced by the cache closest to the processor that contains the data. 
  • At the same time that cache provides information to the caches on the path between itself and the processor.
  • Multilevel cache hierarchy for multi- processors neither a local LRU nor a global LRU.
  • Where all references to a Ci cache are percolated to its parent for rear-ranging the LRU stack at the Ci+i level.

Multilevel caches

  • Another issue is the fundamental tradeoff between cache latency and hit rate. 
  • Larger caches have better hit rates but longer latency.
  • To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger, slower caches.
  • Multi-level caches generally operate by checking the smallest level 1 (L1) cache first.
  • If it hits, the processor proceeds at high speed. 
  • If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.
  • With increased logic density caches can be on same chip as processor.
  • Reduces external bus activity and speeds up execution times.
  • Most contemporary computers have at least 2 levels.
    1. Internal: Level 1 (L1).
    2. External: Level 2 (L2).

Small, fast Level 1 (L1) cache
  • Often on-chip for speed and bandwidth
Multilevel Cache
Larger, slower Level 2 (L2) cache
  • Closely coupled to CPU; 
  • may be on-chip, or “nearby” on module

L2 and L3 Cache

  • Performance improvements depend on hit rates
  • Complicates replacement algorithms and write policy.
  • With L2 cache on-board L3 cache can improve performance just as L2 can improve over L1 alone.

Unified Caches 

  • Higher rate for given cache size because cache is automatically balanced between instructions and data.
  • Only one cache needs to implemented.

Split Caches

  • Split caches have separate caches for instructions and data.
    • These tend to be stored in different areas of memory.
  • Current trend favors split caches.
    • Useful for superscalar machines with parallel execution of instructions and prefetching of predicted instructions.
    • Split cache eliminates contention for cache between instruction fetch/decode unit and the execution unit (when accessing data).
    • Helps to keep pipeline full because the EU will block the fetch/decode unit otherwise.
For example, 
  • The IBM POWER4 (2001) had off-chip L3 caches of 32 MB per processor, shared among several processors;
  • The Itanium 2 (2003) had a 6 MB unified level 3 (L3) cache on-die; 
  • The AMD Phenom II (2008) has up to 6 MB on-die unified L3 cache; 
The benefits of an L3 cache depend on the application's access patterns.

Comments

Popular posts from this blog

Pentium microprocessors