Avoiding Pitfall: Addresses vs. Slots

Introduction

Based on an exam, I know that students were confused as to what was an address and what was a slot. This class note is meant to help clear up any confusion.

Cache Specs

Let's assume the following specs for the cache, so we have an example to work with:

Cache has 64 slots.
Each slot has a data block (also called a cache line) that stores 32 bytes.
For a set-associative cache, we'll assume two slots per set.

Since each data block has 32 bytes, then you need 5 bits to specify each of the 32 bytes (since 2⁵ = 32). These will be bits B_4-0. This part of the address is called the offset, and also serves as the index into the data block array.

Diagram of a Slot

Here's an example of a single slot used in the examples:

Each slot contains a valid bit, a dirty bit, a slot tag. The slot tag contains some number of bits, which depend on the cache scheme. There is also a data block (also called a cache line). This data is a copy of the the data in the memory.

In our example, the data block has 32 bytes. Think of this as a byte array indexed 00000 up to 11111. Each byte of the data block is a copy of a byte in RAM.

Step 1: Data at Address Needed

There are only three occasions when you need to access memory in MIPS programming. Either you load a quantity from memory to a register, using a load instruction. Or you store a quantity from a register to memory, using a store instruction. Or you fetch an instruction from memory. Fetching is automatically done when you run an assembly language program.

Any of these three methods generates an address, and causes data to be moved from memory to a register or from a register to memory. (Fetching an instruction places the instruction from memory to the instruction register).

Let's pick one of the three ways of generating an address: loading a word.

Suppose you have the following MIPS instruction:

  lw $r4, 0($r3)

The address generated comes from R[3]

Suppose the address stored in $r3 is:

  0000 1111 0000 1111 0000 1111 0000 1111

We're going to see how to use this address to find the data stored at that address in cache.

Step 2: Dividing the Address Up

Let's consider each of the three caching schemes. The offset is the same in all three schemes.

Step 2a: Dividing the Address Up (Fully Associative)

In a fully associative cache, you break the address up into two parts: 27 bits for the tag, and 5 bits for the offset.

We're going to refer to an address tag and a slot tag. The address tag comes from an address generated by load/store, or the fetching of an instruction from the PC. The slot tag is located in the cache hardware, in each slot (not surprising!).

Step 2b: Dividing the Address Up (Direct Mapped Cache)

In a direct-mapped cache, the address is broken up three parts: the tag, the slot index, and the offset.

The slot index uses up 6 bits because there are 64 slots, and you need 6 bits to label each slot with a unique UB number (recall the formula ceil( lg ( 64 ) ) = 6).

That leaves 21 bits for the tag. Thus, each slot in a direct mapped cache with the cache specs given earlier has 21 bits for the tag.

Step 2c: Dividing the Address Up (Set Associative Cache)

Let's assume we have two slots per sets. This is called two-way set associativity. That makes, 64/2 = 32 sets. Slot 0 and slot 1 are in Set 0. Slot 2 and slot 3 are in Set 1. In general, Slot 2K and Slot 2K + 1 are in Set K.

In this case, since we have 32 sets, we need 5 bits to specify the set. Thus, instead of a slot index, we have a set index.

The address we've been using in our example specifies set 000 11

Step 3: Determining Cache Hits

Now that we know what the tag bits are, we can determine whether there is a cache hit, looking at each of the three kinds of caching scheme.

Step 3a: Determining Cache Hits (Fully Associative)

To determine if there is a cache hit in a fully associative cache, you compare the tag of the address to the tags of all the slots, in parallel. There are 64 slots in our example, so there is 64 parallel comparisons. The great thing about hardware is that these comparisons can be made in parallel.

There should be, at most, one slot whose tag matches the address tag, and whose valid bit V = 1. If there is a match, this is called a cache hit. If there is no match (i.e., the tag of the address does not match any of the tags in the slots, of if they do, the valid bit is 0), then there is a cache miss.

The diagram below shows the comparison. Only 4 slots are shown to save space. The cache hardware compares the 27 bit tag from the address to the 27 bits tags in all 64 slots, in parallel.

If there's a hit, then the offset from the address is used to access the bytes in the data block (e.g., cache line). In this case, the offset is 1 0000 which is 16_ten. So, if there was a cache hit, you would look at element 16_ten of the data block array. The array is indexed starting at 0.

A cache hit is defined as follows:

address tag must match one of the slot tags
if it matches, the valid bit of the slot must be 1

There should be at most, one valid slot that matches the address tag. If there were two, that would imply two cache slots with the same contents, and that would be a waste, because you would have to maintain two copies.

Step 3b: Determining Cache Hits (Direct Mapped)

Looking at step 2b, the slot index is 000 111 which is 7_ten. Thus, we only check compare the address tag to the tag in slot 7. If the address tag matches the slot tag of slot 7, and the valid bit is 1, we have a cache hit.

The hardware is simpler for a direct mapped cache than a fully associative cache, because you only have to compare the address tag to one slot. The cache hardware only picks the slot is determined by the slot index address bits. In this case, those are bits B_10-5.

However, direct mapped cache do not fully utilize the cache slots. There may be collisions to the same location in memory.

Step 3c: Determining Cache Hits (Set Associative)

Looking at step 2c, the set index is 00 111 which is Set 7_ten. Thus, we only check compare the address tag to the tag in slots in Set 7. Those are slots 14 and 15. If the address tag matches the either of the slot tag (from Slot 14 or Slot 15), and the valid bit is 1, we have a cache hit.

The hardware is simpler for a set associative cache than a fully associative cache, because you only have to compare the address tag to two slots. The cache hardware only picks the slot is determined by the set index address bits. In this case, those are bits B_9-5.

Set associative caches are a reasonable compromise between fully associative caches (whose hardware is complex enough to actually slow down the access of the cache) and a direct mapped cache (which may suffer from problems of collisions to the same slot index).

Each Byte in Data Block Corresponds to Byte in RAM

The data block is a copy of a block of memory. In our example, the data block has 32 bytes, which we consider a 32 byte array.

We can index the data block from 00000 up to 11111 (as 5-bit UB numbers).

How can we find the address of the byte in a particular slot. For example, we may be in slot S, looking at byte B.

Here's how to compute the address from the data block to RAM address.

For a fully associative cache, concatenate the slot tag bits to the index of the data block. There should be 27 bits in the slot tag, plus 5 bits for the data block index, for a 32 bit address.
For a direct mapped cache, concatenate the 21 bits of the slot tag, to the 6 bit slot index (we have 64 slots in our example, so it takes 6 bits to label each slot), to the 5 bit data block index. Again, that gives you a 32 bit address.
For a direct mapped cache, concatenate the 22 bits of the slot tag, to the 5 bit set index (we have 32 slots in our example, so it takes 5 bits to label each set), to the 5 bit data block index. Again, that gives you a 32 bit address.

It's critical that we use powers of 2 for number of slots, number of sets per slot, and number of bytes in a data block. That way, we can pick out bits from the address. If we didn't use powers of 2, then it would be more challenging to determine anything (think about why this is).

Cache Misses

If a cache miss occurs, then for a fully associative and set associative cache, a slot is selected for eviction. If there's an invalid slot, that is the slot where the new data is copied in from RAM. If all slots are valied, then we use some policy (such as LRU) to pick a slot to evict.

If the dirty bit is set, then we must copy the data block back to RAM (assuming a write-back policy). If not, then no copying is done.

Then the new data block is copied, the tag bits are updated, the valid bit set to 1, the dirty bit set to 0, and the desired bytes sent back to the CPU which requested the bytes to begin with.

Clearly, a cache miss can be slow to handle, and we hope that a program has sufficient temporal and spatial locality that cache misses are infrequent compared to cache hits.

Summary

A common error is to confuse the address (generate by load/store instruction, or from the PC of the instruction to be fetched) with the slot.

An address is divided into tag bits, offset, and possibly slot or set index, depending on the caching scheme. The address tags bits are compared to slot tag bits. The slots picked for comparison vary according to the caching scheme.

For a fully associative cache, the address tag bits are compared to all slot tag bits, in parallel. This generally creates hardware so complex, that it slows down the speed of the cache.

For a direct mapped cache, the slot index bits from the address are used to pick exactly one slot. The address tag bits are compared to the slot tag bits for that one slot. If it matches (and it's valid), there's a cache hit.

For a set associative cache, the set index bits pick exactly one set. Each set has 2^k slots (where k > 1). The tag bits of the address are compared to each of the slots in the set, in parallel. In our example, we had two slots per set, so only the tag bits of two slots were compared to the tag bits of the address.

Only slots has valid and dirty bits. The address do not have such bits.

Web Accessibility