Signed Int: Two's Complement

© 2003, 2004 by Charles Lin. All rights reserved.

Older Version

Introduction

C supports two kinds of int types: unsigned and signed.

Most people agree that you use unsigned binary (UB) as the representation system for unsigned int. For machines with 32-bit words, you use 32-bit UB.

However, the intuitive choice for signed int is not the way machines stored signed int. Most people would think of using signed magnitude (SM). However, SM and 1C (one's complement) have problems. Both representation systems have two representations for 0. On the whole, this isn't such a big problem, but it does make hardware more complicated.

We're going to study two's complement (2C) representation. This is the representation system that is used to represent signed int on machines. If you declare an int variable, the computer stores the values using 2C.

At first, you're going to wonder why we use 2C. It's going to seem like a bizarre representation. Yet, it has several nice properties. In particular, it has a single zero, and you can add two 2C numbers using the same hardware as you do for adding two UB numbers, and get the correct 2C answer (provided the answer doesn't overflow, i.e., it lies within the range of permissible 32-bit 2C values). You can't do that with 1C or SM. You'd have to build specialized hardware to do addition for 1C or SM that's different from the hardware for UB.

Here's a great example of a useful representation. 2C has nice properties as a representation, which is why we use it.

Representation

Let B be an N bit bitstring. We label the bits as: B = bN-1 bN-2...b0, where bN-1, the most significant bit (MSb), is the sign bit. If this bit is 1, the value is negative. If it is 0, it is non-negative. 1C and SM also have a sign bit. We assume this bitstring is a representation in 2C.

How to Negate

B is a N bit number in the 2C representation system.

We want to negate B. Thus, if B is negative, then -B is non-negative. If B is non-negative, then -B is negative.

We are negating the 2C representation of B. That is, we're telling you how to manipulate the bits so the corresponding value is the negative of the value before changing the bits.

To negate in 2C, you flip the bits and add 1.

In symbols, -B = ~B + 1 where ~B = \bN-1 \bN-2...\b0. Thus, ~B is B with all bits flipped (thus, \bi means having bit bi flipped).

To add 1, we treat the ~B as if it were a UB number, and then add 1 as if both numbers were UB. If ~B is all 1's, then the result is an N+1 bit number, which is 1 followed by N zeroes. In this case, we ignore the 1, and assume the result is N zeroes.

Negating Twice

As you know from math, if you negate a number twice, you get back it's original value. That is, -(-B) = B.

Does this happen for 2C? Can we flip bits and add 1 to B, then do it again, and get back B? It's not obvious that flipping bits and adding 1 twice gets back the original bitstring. However, it does, and we'll see why.

Converting from Base 10 to 2C

Here's an algorithm for converting a number from base 10 to 2C. We assume that the number can be represented in 2C using N bits for some N.

  1. If the number is non-negative, convert it to N bit UB.
  2. If the number, X, is negative, then convert -X to UB. (Note: -X is non-negative since X is negative). Call the result Y. Negate Y, as explained in the previous section.

The most common mistake is flipping bits and adding 1 when converting a non-negative number to 2C. That's wrong! If you have a non-negative number, just write it in UB, and you're done! You only flip bits and add one if the number you're trying to convert is negative.

As an exercise, convert all numbers from -8 to 7 to 4-bit 2C.

Two's Complement (2C) to Base 10

Converting 2C to base 10 isn't that hard either. It's the reverse process.
  1. If the most significant bit is 1, flip all bits and add 1. If the MSb is 0, go to the next step.
  2. Convert the result of step 1 to base 10. This value should be non-negative.
  3. Add a negative sign if the sign bit was originally 1.

A Brief Detour: Inverses

A function maps from a domain to a codomain. Assume y = f(x). Thus, x is a value from the domain, which is mapped to y, a value in the codomain, by f.

A function is 1-1 if it maps a unique y for every x. In other words, no two x values map to the same y value. A function is onto if every y in the codomain comes from some x in the domain. It may be possible that two or more x values map to the same y for an onto function. A function can be both 1-1 and onto.

If a function is both 1-1 and onto, then it has an inverse. We write the inverse with a -1 superscript. Thus, the inverse of f is written as f-1.

What is an inverse? We know that f maps x to y. Is there a function that maps in the reverse direction? That is, is there a function that maps y back to x? The answer is yes, provided that f is 1-1 and onto. If it is, then x = f-1(y).

Why talk about functions with inverses? We want the ability to negate representations. For SM, we flip the sign bit. For 1C we flip all the bits. We also want the property that negating a bitstring (in a representation system) twice gives back the same bitstring. Clearly, negating in SM twice gives back the same bitstring (we flip the sign bit twice), and the same with 1C (we flip all bits twice).

Negation is a function that maps a bitstring to another bitstring that represents the negated value of the original bitstring's value. To get the original value back, we want to apply the same negation operation a second time. In effect, we want to map y back to x.

Thus, negation is its own inverse function. If neg is the negation function (in some representation system), then we want neg = neg-1. That is, we want the inverse of the negation function to be itself. That way, applying the negation function twice, gives back the original bitstring.

Alternate ways to negate in 2C

There are three other ways to get negate in 2C (other than flipping bits and adding 1). They all produce the same result (they had better!). But it's useful to see these 3 ways so you have alternate ways of getting to the same result. Often doing things in more than one way gives you insight into the whole process.

Method One

Find the rightmost 1, i.e., the least significant 1. That is, find i such that bi=1, but for j < i, bj=0. Flip all bits left of this. That is, flip bits bk where k>i. It turns out this operation is the same as negating in 2C.

Let's see an example of this. For example, consider 10100. The rightmost 1 is b2 (recall we number bits right to left starting at index 0). Flip all bits left of this. We get 01100. This gives you the same result as flipping the bits (01011) and adding 1 (01100).

Why do both methods produce the same result? It's not hard to understand, once you think about it. Let's look closely at what's happening. In particular, pay attention to the rightmost 1 and all bits to its right. I've highlighted them here 10100. I've highlighted bits 100. It's always a 1 followed by zero or more 0's.

Now see what happens when we flip bits. We get 01011. In words, we go from 1 followed by several zeroes to 0 followed by several 1s.

We add 1. This produces 01100. Thus, we're back to 1 followed by several zeroes. However, none of the bits left of these 3 changed! That is b4b3 didn't change values when we added 1. But they were flipped when we originally flipped bits. Thus, flipping bits and adding 1 has the effect of flipping all bits left of the rightmost 1. That's because when you flip the rightmost 1 and all 0's you get 0 followed by all 1's, but adding 1 to this gets you back to 1 followed by zeroes!

It makes sense once you understand how binary addition works.

We know that if we flip all bits of a number, then do it a second time, we get back the same bitstring (this is negation in 1C). It should be fairly obvious that if we flip a subset of those bits, and then flip it again, we get back the original bitstring. In this case, we flip all bits bk where k>i and i is the smallest index where bi=1. That is, we flip bits left of the rightmost 1. If you do that twice, you get back the same bitstring.

I find it's easier to convince people that negating twice gives you back the same 2C representation, if you just notice that you are flipping bits left of the rightmost 1, instead of saying that it's flipping bits and adding 1.

Practically speaking though, it's faster to flip bits and add 1 on most processors than to intentionally try to flip bits left of the rightmost 1. Even though both produce exactly the same result, you want to "code it up" using the flip bits, add 1 method.

Method Two

If B is the representation and you want to negate it, and B has N bits, then perform the following unsigned binary subtraction (this is just normal subtraction as you were taught in grade school, but using binary numbers instead):
  -B = 1(0N) - B
We write 1(0N) to mean 1 followed by N zeroes.

That is, to negate B, subtract from 1, followed by N zeroes. Do this subtraction as if both numbers were unsigned.

Why is subtracting from 1(0N) the same as flipping bits and adding 1?

Consider any number, e.g., X = 1011. Flip the bits, and you get ~X = 0100. Add the two together (X + ~X), and you get 1111. Thus adding a bitstring (say, X) to a negated bitstring (i.e., ~X) produces a bitstring that is all 1's.

More precisely, adding (X + ~X) equals 1N (assuming X is an N bit number) where 1N is our notation for N ones.

Now add 1 to (X + ~X) and you get 1(0N). For example 1111 + 1 = 10000. That is, 1 + 1N = 1(0N) (read: adding 1 to N ones gives you 1 followed by N zeroes).

Thus, if we have B + -B = 1(0N), where -B = ~B + 1, as defined earlier.

Let's see why negating twice gives us back the original value. That is, why --B = B.

Suppose Y = 1(0N). Thus, to negate B means to compute Y - B. Suppose that results in C. Thus, C = Y - B.

How do we negate C? We use the definition from before. Subtract from Y! That means, we need Y - C. However, we already know that C = Y - B. With a little algebra (add B to both sides, subtract C from both sides), you get B = Y - C. That is, negating C gives you back B again.

So, once you know that negation is subtracting from Y, then it's not hard to see, with a little algebra, that subtracting twice from the same number gives you the original number back.

Method Three

There's a third way to negate in 2C. Subtract 1. Then, flip bits. Why does this work? We know that we can flip bits and add 1 to negate. This can be written as addOne(flip(X)). This is the composition of two functions. If you compose two functions and the result has an inverse, then the inverse is the inverse of the two functions in reverse order. Thus, the inverse of addOne o flip (where o is the compose operator) is flip-1 o addOne-1. The inverse of flipping bits is, well, flipping bits. If you flip bits, then to get back the original number, you flip once again.

The inverse of adding one is subtracting one. That is, if you add 1 to a number, then to get back the original, you subtract one. Thus, the inverse is: flip-1 o addOne-1 = flip o subtractOne. Thus, you subtract 1, then flip bits. In this case, we subtract using modular arithmetic. Thus, subtract 1 from 0000 results in 1111.

Since negation is its own inverse, then this inverse function is equivalent to flipping bits and adding 1.

Yet Another View of 2C

2C is endlessly fascinating. There's yet another way to think about two's complement. Think about a binary odometer. An odometer is used to measure the distance travelled, say, in a car, or on a bike.

If you drive backwards in a car, most odometers will not change. However, if you pedal backwards on an exercise bike, some odometers will go in reverse. Let's imagine you have such a bike. Let's assume the odometer shows it's distance in binary.

You start of at 0000. If you pedal forward 3 km (let's pedal in metric), the odometer will read 0011. On the other hand, if you start of at 0000 and pedal backwards three kilometers, you would get 1111 (that's back one step), then 1110 (back two steps), and finally 1101 (back three steps). Thus, 1101 could be thought of as -3 since we went backwards 3 kilometers.

Is 1101 the representation for -3 in 2C? Let's see!

Try negating 3 in 2C. Start with 0011. Flip bits to get 1100. Add 1 to get 1101. That's the same number as we got by pedalling backwards 3 kilometers. It turns out this is true of any negative value. If you want to get, say, -7, pedal backwards 7 kilometers (starting at 0000) on a binary odometer, and that gives you the 2C representation.

(Of course, you might pick a number that can't be represented in the number of bits you give it. For example, with 4 bits 2C, you can't represent -20. If you pedal backwards 20 km, you'll end up with a representation that's not -20. This makes sense. After all, with only 4 bits, you can only represent 16 different values, and -20 isn't one of them).

Modular Arithmetic: The "Real" Math Behind 2C

Think about doing math with a normal base 10, 2-digit odometer. If you add 1 + 2, you get 03. If you add 25 + 25, you get 50.

What if you had 50 + 51? Normal decimal addition says you get 101. However, since you only have two digits, the answer has to be 01.

Modular arithmetic does computation modulo N, for some N. For example, with 2 decimal digits, we're going math modulo 100. Thus, 50 + 51 is really (50 + 51) % 100 = 1 where is the remainder operator. All operations in arithmetic modulo N are done with the remainder operator followed by N. That way, all answers fall within 0 and N-1.

Modular arithmetic is basically outside the scope of this course, but deals with doing math where numbers are confined within a range, and values wrap around. Thus, in modular arithmetic modulo 100, -1 is the same is 99, -2 is the same as 98, and so forth. All integers fall into equivalence classes. The number of equivalence classes is equal to N.

Courses in number theory and perhaps abstract algebra would cover this kind of arithmetic, and is the basis behind 2C.

Interesting Properties in 2C

Let's see what happens you negate 0 in 2C. For example, represent 0 using 4 bits. That's 0000. To negate, flip bits (resulting in 1111) and add 1 (resulting in 10000, but throw out the most significant bit to get 0000).

Notice that negating 0 in 2C returns back 0! That's very nice because that means we have a single representation for 0.

There's another number with the same property. That is, there's another N-bit number which you can write in 2C, where you flip its bits and add 1, and it gives you back the same representation. Do you know which representation it is?

It's 1, followed by N-1 0's (which we write as 1(0N - 1)).

Does it really work? We'll try it on a 4-bit 2C number.

Voila! We're back where we started. The negation of 1000 in 2C is 1000. Clearly, this is NOT true of all 4 bit 2C numbers! It's only true of 0N and of 1(0N - 1). Do this with any other 2C representation, and you get a different answer. For example, consider 0100. Flip bits gives you 1011, and add 1 gives you 1100. So the negation of 0100 in 2C is 1100. These aren't the same representation.

It makes sense for 0000 to stand for 0 in 4-bit 2C. But what value should the representation, 1000, be? First of all, it makes sense for this number to represent a negative value because the sign bit is 1. If you're given a representation of a negative number in 2C, how do you find out what number it is? Flip the bits and add 1, then convert to base 10, and add a negative sign.

For example, if you were asked what value is 1100, you'd flip the bits to get 0011, then add 1 to get 0100. Treat this as a UB number, and convert to base 10 to get 4. Add a negative sign to get -4. Thus, 1100 is the 4-bit 2C representation of -4.

We run into a problem with 1000. Flip the bits and adding 1 just gives us back 1000. What are we supposed to do?

But let's follow the rule, which says that we should convert this number to base 10. We do so by treating 1(0N - 1) as an unsigned int. So, with 4 bits, you have 1000, which is 8 in base 10. According to the algorithm, we add a negative sign, which results in -8.

That turns out to be the value we want representation, 1000, to map to. Again, think about the binary odometer. If you go back 8 kilometers from 0000, you will end up at 1000. To convince yourself of that, realize that pedalling backwards 1 kilometer gets you to 1111.

Now focus on the last three bits. It's 111 (or 7). It will take 7 more kilometers backwards to get that to 000. Thus, 1 km back (to get to 111), plus 7 more km (to get back to 1000) for a total of 8 steps, so 1000 is -8 (or 8 kilometers backward).

One nice thing about 2C, so far, is that we have a single zero representation. Recall that both SM and 1C have two representations for zero (positive and negative zero), and that two zero representations generally creates problems that a single zero representation doesn't.

Does 2C solve the addition problem? That is, can we add two values in 2C get the correct 2C representation for the sum?

The answer turns out to be yes.

Take two numbers like -1 + 2, and you will see normal addition works. For example, 1111 is -1. 0010 is +2. Add the two and you get 0001 (again, we always throw out the additional bit that would otherwise make it a 5 bit answer).

Now why does adding two numbers represented in 2C work? Let's consider adding X + Y. If both are non-negative, then it's unsigned addition, so that clearly works. If one is negative, and the other is positive, then just imagine the odometer set at the negative value, and adding a positive number. Intuitively, it still adds correctly.

The only one that may not intuitively make sense is when you add two negative numbers. Let X and Y both be negative numbers written in 2C.

Therefore, we can write X as [1(0N) - -X] and Y as [1(0N) - -Y]. Recall that one way to write 2C for negative values is to subtract the positive value from 1 followed by N zeroes (for an N bit 2C representation).

If you add the two, you get [1(0N+1) - (-X + -Y)]. This is almost 2C, except that you subtract from 1 followed by N+1 zeroes, instead of N zeroes. In other words, we're writing this value in 2C for N + 1 bits, rather than N bits.

The key for this to work is to realize that adding two negative values results in a negative value. In particular, it's the negation of (-X + -Y) (where -X and -Y are both positive because X and Y are both negative).

However, as long as -X + -Y doesn't get too large, this is equivalent to (1(0N) - (-X + -Y). That is, it's equivalent to writing the value in 2C for N bits.

To understand that, let's look at an example. Consider -1. In 4 bits, you write this as 1111. You can also write this as (10000 - 1). So, if you add -1 to -1, that's the same as (10000 - 1) + (10000 - 1), which is (100000 - 2). Now 100000 - 2, is 11110, which is 5 bits. As long as -X and -Y aren't too large, the most significant bit of the addition will always be 1. So, if we simply ignore that 1, then 100000 - 2 (the 2C representation for -2 using 5 bits) is the "same" as 10000 - 2 (the 2C representation for -2 using 4 bits), which is 1110.

It's not that unreasonable to ignore that additional, leftmost 1. Consider representing -2 using 4 bits. That's 1110. Consider representing -2 with 6 bits. That's 111110. The only difference between the two representations is the additional 1's that appear in the upper bits.

In fact, to convert any number in 2C from N bits to N + K bits, just add K bits to the left of the value, where the bit's value is taken from the sign bit, i.e., the MSb. This is called sign extension.

Clearly, sign extension works for positive/zero values since you merely add 0's to the left. Thus, 0010 maps to the same value as 000010. Adding more zeroes doesn't change what value it maps to.

It turns out it works when the sign bit is negative. We just saw an example where 1110 maps to -2, and 111110 also maps to -2 (using 2C in 4 and 6 bits respectively).

Thus, (10N+1 - (-X + -Y) is really 2C using N + 1 bits, which is fortunately equivalent to 2C using N bits provided (-X + -Y) isn't too large.

How Many Values?

Let S be the set of all N bit 2C representations for a fixed value of N. Create a new set T which are the values of those representations. How big is T? That is, how many values can we represent using N bit 2C?

To determine the number of values, realize that half the representations have 1 as the MSb, and the other half have 0. All numbers that have 1 as the MSb are indeed negative. In particular, there are no negative zeroes.

The smallest negative number is -1. Since there are 2N-1 negative representations with N bits (which is half the total 2N representations), then the largest magnitude negative number must be -2N-1. Notice there's not a "- 1" at the end.

Why not? Suppose I tell you the smallest number of K consecutive ints is 0. What's the largest number? It's K - 1. However, if the smallest number is 1, then the largest is K. Same thing applies here. All we've done is add a negative sign in front. We have 2N-1 values, so K = 2N-1. Thus, the range is from -1 to -2N-1, where a minus sign is placed in front of each number.

There are also 2N-1 representations with 0 as the MSb. However, one of those representations is zero. That leaves 2N-1 - 1 representations for positive values, starting at +1. The range of positive values is therefore 1 to 2N-1 - 1, inclusive.

That means that there's one more negative value than positive value. In particular, we can't negate -2N-1. It lies outside the range of positive values, i.e., it overflows. That's the price we pay for only having a single zero. However, this is a minor problem given the other nice properties of 2C.

2C Wins!

We've come up with three representations for signed int values: SM, 1C, and 2C (we'll discuss excess/bias representation next).

However, 2C has all of the nice properties. It has a single zero. You can add two 2C numbers using hardware for unsigned int addition.

This is the representation that's used when you declare:

  int x ;  // This is stored in 2C

Chart

Number of Bits Number of Values Min Value Max Value
16 216 -(215 - 1) 215 - 1
32 232 -231 231 - 1
N 2N 2N-1 2N-1 - 1

This chart differs from 1C and SM, because there's a 1-1 mapping of representations to values, i.e., there aren't two representations for the value 0. This affects the minimum and maximum value given N bits.

Common Mistake

There's a difference between representing a number in 2C, and negating a representation in 2C. To represent a number in 2C, you figure out whether the number is negative or non-negative. If it's non-negative, just write it in UB (unsigned binary). If it is negative, represent the absolute value of the number in UB, then flip bits and add 1.

The mistake that some people make is to take non-negative values and flip bits and add 1. Thus, when asked to represent +2 in 2C, some people write 0010 (which is +2), then flip bits to get 1101, then add 1 to 1110. They've succeeded in writing -2!

The act of negating a representated number in 2C requires you to flip bits and add 1. Equivalently, you can subtract 1, and flip bits, but most prefer to add. Equivalently, you can flip all bits left of the rightmost 1.

Conclusion

2C is the way that computers represent signed integers. Unlike 1C and SM, it has a single zero. Also, you can perform addition with 2C using a hardware adder meant for UB (unsigned binary). These two facts make it the preferred choice for representing signed integers.

Web Accessibility