Fast Addition: Carry Lookahead Adders

Introduction

In our discussion about ripple-carry adders, we said that adding two n-bit numbers requires O(n) time. The reason is the carry. As you perform the addition on the least significant bits, you may have a carry that "ripples" its way to the most significant bit. If we didn't have a carry, then we should be able to add in O(1), because adders can work in parallel.

Can we do better? Can we make addition faster? Clearly, the answer is yes, otherwise, we wouldn't have this discussion.

Looking at Carries

The following is a truth table for a full adder.

Row	x	y	c_in	c_out	s
0	0	0	0	0	0
1	0	0	1	0	1
2	0	1	0	0	1
3	0	1	1	1	0
4	1	0	0	0	1
5	1	0	1	1	0
6	1	1	0	1	0
7	1	1	1	1	1

We can write the carry-out of this adder using sum of products. However, there is an alternate way of writing the carry out.

Before we discuss how to write it, let's define a few terms first. Recall that we are adding two n-bit numbers: x_n-1...x₀ to y_n-1...y₀ which results in z_n-1...z₀. In a ripple carry adder, we assign a full adder (call it adder i) to add x_i to y_i. We'll use the ripple carry adders which only uses full adders.

Adder i also has a carry in. We'll call that c_i. It generates a carry out, which we'll call c_{i + 1}. When we add the least significant bits c₀ = 0. However, we'll simply leave it as c₀.

We can write the Boolean expression for the carry-out in an alternate way:

   c_i+1 = x_iy_i + x_ic_i + y_ic_i

This Boolean expression has the same truth table as above. It even makes intuitive sense. There's a carry out when any two of the three bits (i.e., x_i, y_i, or c_i) have a value of 1. Thus, x_iy_i is 1 only when both x_i and y_i are 1. And since this is inclusive OR, all three bits could be 1.

We use the distributive property to rewrite the carry out as:

   c_i+1 = x_iy_i + c_i(x_i + y_i)

Let's define two terms:

   g_i = x_iy_i 
   p_i = x_i + y_i

Then, we can rewrite c_{i + 1} using these two terms, as:

   c_i+1 = g_i + p_ic_i

g_i is called the generate term. This always generates a carry out. g_i = x_iy_i, which is 1 only when x_i and y_i are both 1. Clearly, if two bits are 1, there's a carry.

p_i is called the propagate term. p_i = x_i + y_i. This could create a carry, but it may not. We know that at least one of x_i or y_i must be 1 if p_i = 1.

This is called propagate for the following reason. If exactly one of x_i or y_i is 1, then the carry-in determines whether there is a carry-out of 1 or not. Thus, if c_i is 1, there is a carry-in of 1, which will cause a carry-out of 1. If c_i is 0, then the carry-out is 0.

To propagate, in this case, means to assist. For example, suppose you are standing in a line. Some one says to hand something forward, say, a piece of paper. Even though you did not bring the piece of paper, you can help move it along (i.e., help propagate it) by passing it to the next person. Similarly, propagating a carry means to move the carry from the previous adder to the next adder.

Mostly, we only care about propagating a 1. We're not that concerned about propagating 0's. That's why we AND p_i to c_i. By itself, p_i may not cause a carry. However, it can propagate a carry-in of 1, by causing a carry-out of 1, if p_i is 1.

Why Generate and Propagate?

Why generate and propagate. If you look at the Boolean expressions for p_i and g_i, you will see that they both use only x_i and y_i. Neither depend on the carry. Since x_i and y_i are available immediately, this gives us hope that we can avoid waiting for carries.

This is how we do it. First, let's write the c₁ which is the carry out for the adding bit 0 of x and y.

   c₁ = g₀ + p₀c₀

Now, we write it for c₂.

   c₂ = g₁ + p₁c₁

At this point, we've used c₁, which we'd rather avoid, because it means waiting for that carry to be computed. However, we just defined c₁ (see the previous formula) as g₀ + c₀p₀.

Let's plug that in:

   c₂ = g₁ + p₁(g₀ + p₀c₀)
      = g₁ + p₁g₀ + p₁p₀c₀

Notice that we no longer have c₁. That means we no longer have to wait for the carry!

Let's go one more step further:

   c₃ = g₂ + p₂c₂

Again, we would prefer to avoid using c₂ since this requires us to wait for the result to propagate across two adders. However, we already have a Boolean expression for c₂ (we just computed it a moment ago) that doesn't use any carries except c₀ which we have right away.

   c₃ = g₂ + p₂c₂
      = g₂ + p₂(g₁ + p₁g₀ + p₁p₀c₀)
      = g₂ + p₂g₁ + p₂p₁g₀ + p₂p₁p₀c₀

Already, you should be able to detect a pattern. By following the same pattern, you'd expect:

   c₄ = g₃ + p₃g₂ + p₃p₂g₁ + p₃p₂p₁g₀ + p₃p₂p₁p₀c₀

What Do We Do Now?

By now, you've noticed that we're able to compute the carries in terms of x_i's, y_i's, and c₀. These bits are available right away. We don't have to wait for any other carry to compute it.

You might wonder, why? Why are we computing carries, if the idea is to avoid using carries?

The idea is to construct a circuit for c_i that can be fed into the input of adder i. The carry-out of the adder is not used. In fact, each of the full-adders aren't directly connect to each other. We simply design the circuit for c_i and feed it in.

The following is an approximate circuit.

The diagram is incomplete. It's missing the wires from x_i and y_i to the appropriate adders. As an exercise, redraw the diagram, and add those wires in. Also, draw x_2-0 as 3 separate inputs, instead of the one circle drawn in the diagram above. Do the same for y_2-0. If you want, you can even draw in the logic inside the boxes that say things like "c₃ LOGIC".

What should you understand from this diagram? The key point to observe is the carry out of the full adder is not attached to the carry in of the next adder. Instead, there is logic (as defined by the Boolean expressions from the previous section) that has inputs from x_i's, y_i's and c₀.

Summary

By using these formulas, we can cut down the adders from O(n) to what appears to be O(1) time. After all, p_i and g_i only depend on x_i and y_i, which are the bits of x and y which are immediately available to us. They also only depend on c₀, which is also immediately available. We don't have to wait for carries to perform this computation.

Still, it's not quite O(1). Why not? Look at the formula, for say, c₄.

   c₄ = g₃ + p₃g₂ + p₃p₂g₁ + p₃p₂p₁g₀ + p₃p₂p₁p₀c₀

Given c_i, there are i terms to be ORed. The largest product term, (in the example above, it's p₃p₂p₁p₀c₀) has i+1 literals.

If you build everything out of 2-input AND gates or 2-input OR gates, then the best you can do is about O(lg N) where N is the number of bits in the addition. Still, that's far more efficient than O(N).

This is one lesson in hardware. Often, you can build faster circuits if you're willing to use more gates (of course, that has its limitations too---more hardware can also mean slower hardware, because it uses up a lot of space, however, the savings in computations usually wins out).

Web Accessibility