CACHE MEMORY

  • A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. 
  • The cache is a smaller, faster memory which stores copies of the data from frequently used main memory locations. 
  • As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.
  • An L1 cache is on the same chip as the microprocessor.
  • L2 is usually a separate static RAM (SRAM) chip. 
  • The main RAM is usually a dynamic RAM (DRAM) chip.

Cache entries

  • Data is transferred between memory and cache in blocks of fixed size, called cache lines.
  • When a cache line is copied from memory into the cache, a cache entry is created. 
  • The cache entry will include the copied data as well as the requested memory location (now called a tag).
  • When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache. 
  • The cache checks for the contents of the requested memory location in any cache lines that might contain that address. 
  • If the processor finds that the memory location is in the cache, a cache hit has occurred. However, if the processor does not find the memory location in the cache, a cache miss has occurred.
    • Cache hit: The processor immediately reads or writes the data in the cache line. 
    • Cache miss: The cache allocates a new entry, and copies in data from main memory. Then, the request is fulfilled from the contents of the cache. 

Cache performance

  • The proportion of accesses that result in a cache hit is known as the hit rate.
  • Read misses delay execution because they require data to be transferred from memory much more slowly than the cache itself.
  • Write misses may occur without such penalty, since the processor can continue execution while data is copied to main memory in the background.
  • If data is written to the cache, at some point it must also be written to main memory.
write-through cache: 
  • Every write to the cache causes a write to main memory.
write-back or copy-back cache: 
  • Writes are not immediately mirrored to the main memory. Instead, the cache tracks which locations have been written over (these locations are marked dirty). 
  • The data in these locations are written back to the main memory only when that data is evicted from the cache. For this reason, a read miss in a write-back cache may sometimes require two memory accesses to service: one to first write the dirty location to memory and then another to read the new location from memory.

CPU stalls

  • The time taken to fetch one cache line from memory (read latency) matters because the CPU will run out of things to do while waiting for the cache line. When a CPU reaches this state, it is called a stall.
  • A stall is just a mechanism to let the CPU wait for all kinds of data and addresses to settle before continuing.
  • modern CPUs can execute hundreds of instructions in the time taken to fetch a singlecache line from main memory.
  • Various techniques have been employed to keep the CPU busy during this time.
    • Out-of-order CPUs attempt toexecute independent instructions after the instruction that is waiting for the cache miss data.
    • Another technology, used by many processors, is simultaneous multithreading (SMT) or hyper-threading (HT), which allows an alternate thread to use the CPU core while a first thread waits for data to come from main memory.

Cache entry structure

Cache row entries usually have the following structure:
    • The data block (cache line) contains the actual data fetched from the main memory. 
    • The tag contains (part of) the address of the actual data fetched from the main memory.
  • The "size" of the cache is the amount of main memory data it can hold.
  • This size can be calculated as the number of bytes stored in each data block times the   number of blocks stored in the cache.

 Flag bits

  • An instruction cache requires only one flag bit per cache row entry: a valid bit.
  • The valid bit indicates whether or not a cache block has been loaded with valid data.
  • On power-up, the hardware sets all the valid bits in all the caches to "invalid".
  • A data cache typically requires two flag bits per cache row entry: a valid bit and also a dirty bit. 
  • The dirty bit indicates whether that block has been unchanged since it was read from main memory -- "clean"—or  whether the processor has written data to that block -- "dirty".

Associativity

  • The replacement policy decides where in the cache a copy of a particular entry of main memory will go.
  • If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is called fully associative.
  • if each entry in main memory can go in just one place in the cache, the cache is direct mapped.
  • Many caches implement a compromise in which each entry in main memory can go to any one of N places in the cache, and are described as N-way set associative.

Direct-mapped cache

  • Here each location in main memory can only go in one entry in the cache.
  • It doesn't have a replacement policy.
    • This means that if two locations map to the same entry, they may continually knock each other out.
    • A direct-mapped cache needs to be much larger than an associative one to give comparable performance.
Each location in main memory can be cached by just one cache location.
Each location in main memory can be  cached by just one cache location.  

Two-way set associative cache

  • Each location in main memory can be cached in either of two locations in the cache
  • One benefit of this scheme is that the tags stored in the cache do not have to include that part of the main memory address which is implied by the cache memory's index.
  • Since the cache tags have fewer bits, they take less area on the microprocessor chip and can be read and compared faster. Also LRU is especially simple since only one bit needs to be stored for each pair.

Speculative execution

  • One of the advantages of a direct mapped cache is that it allows simple and fast speculation.
  • Once the address has been computed, the one cache index which might have a copy of that location in memory is known. That cache entry can be read, and the processor can continue to work with that data before it finishes checking that the tag actually matches the requested address.
  • A subset of the tag, called a hint, can be used to pick just one of the possible cache entries mapping to the requested address. 
  • The entry selected by the hint can then be used in parallel with checking the full tag.

Two-way skewed associative cache

  • The index for way 0 is direct, as above, but the index for way 1 is formed with a hash function.
  • A good hash function has the property that addresses which conflict with the direct mapping tend not to conflict when mapped with the hash function.
  • The downside is extra latency from computing the hash function.

Pseudo-associative cache

  • A true set-associative cache tests all the possible ways simultaneously, using something like a content addressable memory. 
  • A pseudo-associative cache tests each possible way one at a time. 
  • A hash-rehash cache and a column-associative cache are examples of pseudo-associative cache.
  • In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache. But it has a much lower conflict miss rate than a direct- mapped cache, closer to the miss rate of a fully associative cache. 

        Cache miss

    • A cache miss refers to a failed attempt to read or write a piece of data in the cache, which
    • results in a main memory access with much longer latency. 
    • There are three kinds of cache misses: instruction read miss, data read miss, and data write miss.

        Cache read miss 

    • Cache read miss from an instruction cache generally causes the most delay, because the processor, or at least the thread of execution, has to wait (stall) until the instruction is fetched from main memory.
    • A cache read miss from a data cache usually causes less delay, because instructions not dependent on the cache read can be issued and continue execution until the data is returned from main memory, and the dependent instructions can resume execution.

        Cache write miss

    • A cache write miss to a data cache generally causes the least delay, because the write can be queued and there are few limitations on the execution of subsequent instructions.
    • The processor can continue until the queue is full.



Comments

Popular posts from this blog

Pentium microprocessors

Multilevel Organization of Cache Memory