Interleaved Memory

Introduction

The Motorola 68000 was introduced in the late 1970's or the early 1980's. This was one of the first chips to use 32 bits for addresses. In principle, this meant the chip could access 2³² different memory locations.

However, 2³² is 4 gigabytes, which is extremely large, especially for the time. This brings about a very interesting point. Even though CPUs have enough address pins to address large amounts of memory, it's not always the case that there is any RAM at all memory locations!

You can create combinational logic circuits that determine whether an address is valid or not, and send a signal back to the CPU if the address is invalid. This is left for another set of class notes (which currently don't exist).

To create the memory you needed, you put together one or more chips of RAM.

Let's assume, for simplicity's sake that our computer can only access 4K of RAM. The amount memory is quite small even twenty years ago, but makes it easy to illustrate the point of interleaved memory.

However, we have chips that contain only 1K of RAM, thus we need four such chips to create a total of 4K.

Diagram of a RAM chip

Let's look at a diagram of a 1K RAM chip.

Let's look at the various pins of this chip.

V_cc This is 5 volts. To power the chip you need two voltages. The high voltage and ground. They do not add functionality to the chip, but without them, the chip wouldn't work.
V_dd This is ground. Again, for the powering of the chip.
A_9-0 Since there is 1K of memory in the chip, then you need lg 1024 = 10 bits to specify the address in RAM you want to access.
R/W' This means "read/not write". If R/W' = 1, you want to read a value from RAM. This value is place on the D_7-0 as output. If R/W' = 0, you want to write to RAM, in which case, D_7-0 serves as input. The address is specified by A_9-0.
CE Chip Enable (also called "Chip Select"). If CE = 1, then the chip is enabled. That is, you do either a read or write based on the value of R/W'. If CE = 0, the chip is not enabled (i.e., selected), so there is no attempt to read nor write.
D_7-0 This is the data to or from RAM. It is bidirectional, meaning you can read from RAM (in which case D_7-0 acts as the output) or write to RAM (in which case D_7-0 acts as input). Sometimes, there are 16 pins for data, with 8 pins being input and 8 pins being output. In our example, we use 8 pins that are bidirectional.

Where do the signals to the RAM come from? The address bits come from the CPU. The CPU has 32 bits of address, A_31-0. Clearly only 10 bits can go to this chip since the chip only has 10 address bits.

Does it have to be bits A_9-0? Not necessarily. Any 10 bits will do. In the diagram above we see A_19-10 fed from the CPU (via the address bus) to the RAM. It may not make any sense to wire the RAM in this way, but it is possible to do so.

R/W' is one of the control signals the CPU can send to RAM>

We'll need to develop the logic for the chip enable in a moment.

Memory as Abstraction

We would like to give the illusion that there is 4K of RAM. That is, users can access addresses M[0] up to M[4095]. This is an abstraction we want to present to the user of memory.

We can think of a software analogy. Suppose you want to allow a person using your program to access 4096 array elements. However, due to a strange programming language, arrays only have a maximum size of 1024 values.

How might you give the illusion that you have 4096 values? One way to do so is to declare four arrays of size 1024. Call these arrays arr0, arr1, arr2, and arr3. The user thinks they have an array called arr with indexes from 0 up to 4095.

In reality, when the user wants to access index 0 through 1023, you get the value from arr0. If they want to access values from index 1024 to 2047, then you subtract 1024 from the index, and access the array element at the new index from arr1.

If they want to access array elements at index 2048 to 3071, then you subtract 2048 from the index, and access the array element at the new index from arr2. You can do similar things for arr3.

Thus, you can create the illusion you have a much larger array even if the language imposes restrictions.

Implementing 4K of RAM

We can try to apply the same strategy to make it appear as if we have 4K of RAM, instead of 4 separate 1K RAM. Each chip can be thought of as an array which has elements C[0] up to C[1023].

Here's one way we can map memory to the chips.

Each chip is indexed by 10 bits which are the address bits. The index goes from 0_ten up to 1023_ten.

I've also numbered the chips (in binary) as chips 00, 01, 10, and 11.

Suppose someone asks you where M[1025] stored. Your answer would be in Chip 01 at index 0. If someone wanted to know where M[3071] is stored, you would say it's in Chip 10, at index 1023. M[1023] is like a virtual element, and the actual element is in Chip 01 at index 1023.

A Problem

When we access data from memory, we often access it 4 consecutive bytes at a time, at a word-aligned address. Let's see how long that takes us.

Suppose the amount of time to access one byte from a RAM chip is T. Then, it will take 4T to access 4 consecutive bytes at a word aligned address. That's because 4 consecutive bytes all reside on the same chip, and it takes time T to get each byte.

If we didn't care about getting word-aligned words, we might be able to do better, but we'd have to be very lucky. If you want M[1022], M[1023], M[1024], and M[1025], you can access this in 2T time.

That's because the first two bytes are in Chip 00 and the last two bytes are in Chip 01. So you can access one byte at a time for both chips in parallel. However, there's only a small number of addresses that work like this.

Can we do better? Can we access four bytes of memory in T time instead of 4T?

Interleaving Memory

Yes, we can do better. Instead of putting a range of memory on one chip, we can "stripe" memory across all four chips. It would look like this:

Thus M[0] is on Chip 00, M[1] is on Chip 01, M[2] is on Chip 10, M[3] is on Chip 11, and M[4] is back on Chip 00.

Notice that if we access a word aligned quantity, the first byte always appears on Chip 00. Furthermore, because we access four consecutive bytes, they are always at the same index on every chip.

For example, if we want a word starting at address M[4092], we access index 1023 of all four chips. This allows us to access words in T time.

In the example above, we have four way interleaving, since a chip contains every fourth byte.

If someone asks where is M[5], you can say it's on Chip 01 at index 1. If someone wants M[2], it's at Chip 10 at index 0.

In general, where is M[i]?

The chip is at i % 4
The index is at i / 4

For example, if you want M[103], then this is at chip 103 % 4 = 3 (i.e., Chip 10), and the index is at 103 / 4 = 25 (using integer division).

Equivalently, think of the address in binary. The chip is the low two bits. The index is bits B_11-2.

Designing the Logic for Chip Enable

Let's imagine that the CPU generates three control signals.

B which indicates that the CPU wants to access a byte. The address can be anything (within the 4K). Thus, B = 1 when the CPU accesses a byte and B = 0 when the CPU wants to access a halfword or word.
H which indicates that the CPU wants to access a halfword. The address must be halfword aligned. This means the address is divisible by 2, or equivalently B₀ = 0 (i.e., the LSb is 0). When H = 1 the CPU wants to access a halfword. When H = 0 when the CPU wants to access a byte or word.
W which indicates that the CPU wants to access a word. The address must be word aligned. This means the address is divisible by 4, or equivalently B₁B₀ = 00 (i.e., the low two bits are 00). When W = 1 the CPU wants to access a halfword. When W = 0 when the CPU wants to access a byte or halfword.

Let's think about what chips are enabled under various circumstances.

When the CPU wants to access a byte, only one chip is enabled.
When the CPU wants to access a word, all four chips are enabled.
When the CPU wants to access a halfword, two chips are enabled. Either chips 00 and 01 are enabled, or chips 10 and 11. That's because all halfwords are accessed at halfword aligned addresses.

An observation

What do you notice about the memory addresses in Chip 00? They are M[0], M[4], M[8], etc. All those addresses are divisible by 4. Equivalently, the low two bits of the binary address are 00.

What do you notice about the memory addresses in Chip 01? They are M[1], M[5], M[9], etc. All those addresses are congruent to 1 mod 4 (that is, dividing the index by 4 leaves a remainder of 1). Equivalently, the low two bits of the binary address are 01. This is why it's called Chip 01.

What do you notice about the memory addresses in Chip 10? They are M[2], M[6], M[10], etc. All those addresses are congruent to 2 mod 4 (that is, dividing the index by 4 leaves a remainder of 2). Equivalently, the low two bits of the binary address are 10. This is why it's called Chip 10.

Finally, what do you notice about the memory addresses in Chip 11? They are M[3], M[7], M[10], etc. All those addresses are congruent to 3 mod 4 (that is, dividing the index by 4 leaves a remainder of 3). Equivalently, the low two bits of the binary address are 11. This is why it's called Chip 11.

From this observation, you see only the addresses in Chip 00 are word-aligned, i.e., only Chip 00 can contain the first byte of a word. Also, Chips 00 and 10 are halfword aligned, so only those chips can contain the first byte of a halfword.

Logic for Chip 00

Let's come up with the logic for Chip 00. We consider what happens case by case.

If W = 1, then all four chips are enabled.
If B = 1, then we only select chip 00. This is the chip where all memory address (i.e., M[i]) ends in 00. That is, when A₁A₀ = 00.
If H = 1, then we know that A₀ = 0, so we don't need to test for that. However, A₁ can either be 0 or 1. Notice that in Chip 00 (and Chip 01) A₁ equals 0. Thus, we only select Chip 00 if A₁ = 0.

Putting all these facts together, we generate the following Boolean expression for chip enable for Chip 00.

  CE = W + (H * A₁') + (B * A₁'A₀')

The circuit looks like this:

In particular, pay attention to the input to A_9-0. Remember that when we wanted to get M[i], we accessed index i / 4. This is the same as shifting the bits right by 2, or equivalently, accessing bits A_11-2 from the address bus.

Thus, A_11-2 comes from the address bus, while R/W' is a control signal from the CPU.

Logic for Chip 10

To see if you follow how to design the chip enable logic, let's see if you can do it for chip 10.

If W = 1, then all four chips are enabled.
If B = 1, then we only select chip 10. This is the chip where all memory address (i.e., M[i]) ends in 10. That is, when A₁A₀ = 10.
If H = 1, then we know that A₀ = 0, so we don't need to test for that. However, A₁ can either be 0 or 1. Notice that in Chip 10 (and Chip 11) A₁ equals 1. Thus, we only select Chip 00 if A₁ = 1.

Here's the logic for the chip enable for Chip 10.

  CE = W + (H * A₁) + (B * A₁A₀')

This is how the chip enable logic looks for Chip 10.

To convince yourself it works, try writing the logic for all four chips, then pick some address like 10, and see what happens if, say, H = 1. See which chips are enabled. Make sure if H = 1 you pick a halfword aligned address, and if W = 1 you pick a word aligned address.

Invalid Addresses

Notice that we only use 12 bits of the address from the address bus. We use bits A_11-0. What about the other 20 bits?

One possibility is to make sure those 20 bits are 0 (using a very large NOR gate), and using the output of the NOR gate, then AND that result with the chip enable from above, and feed the result of that into the chip enable.

Otherwise, you are essentially allowing any address that has the correct low 12 bits to access the chips, and the upper 20 bits become irrelevant. If this isn't a problem, you don't have to deal with it, but you can assume that if the upper 20 bits are not all zeroes, then none of the chips are enabled.

Summary

Interleaving memory allows you to access several bytes of memory simulateneously. You can come up with the logic for the chip that enables the correct chips.

Admittedly, the CPU must still sort out the bytes it does get, but we assume it can do that.

Web Accessibility