Representation vs. Value

Older version: Feb 24 version

Introduction

Someone places three oranges on a table, and asks you how many oranges you see. You say "3". There is an abstract idea of what 3 means. When someone asks you to make three marks or three noises, you understand what three is.

However, when you write three down, you use a single symbol, namely 3. This mark is known as a representation. This single symbol represents the abstract idea of 3.

Other languages use different representations. For example, in Roman numerals, you write it as III. In Chinese, it's written with three horizontal lines. Interesting, both representations use three real lines to represent the abstract idea of 3, where a single symbol is used in Arabic numbers.

However, you wouldn't want to represent numbers in unary (i.e., one symbol per mark) It would get very cumbersome very quickly. Imagine writing 100 using 100 separate marks. You see that non-Arabic numbering systems realize that unary is no good. In Roman numerals, you write X, which is a single symbol for 10, and in Chinese, you don't use ten separate mark to write down the representation for 10. (It's basically a cross, using two marks).

Representations allow us to write values, which are an abstract idea, in a concrete, preferably, compact way. Although there is a single abstract idea of "3", there's many different representations for that idea. Just now, we've just seen representations of 3 in three different languages.

However, each new representation doesn't require new symbols. We can use the same symbols (i.e., the same representation), yet its meaning (i.e., value) can be different. Does that confuse you? Let's look at an example.

What number is "10"? Did you say ten? What if I tell you that "10" isn't written in base ten. What if I say it's in base two?

There's a joke that goes "There are 10 kinds of people in the world. Those that understand binary, and those that don't". The joke is that 10 is not in base ten, but in base two. Thus 10, in base two, is the number two.

Notice we have the same symbols (e.g, "10") but the value correponding to these symbols depend on the representation systems. Thus, if "10" is in base ten, the value is 10. If "10" is in base two, the value is 2.

A Problem

Reread the penultimate (i.e., next-to-last) sentence of the previous section. It says:

   Thus, if "10" is in base ten, the value is 10.

We see a problem. We write "10" to indicate the representation, and 10 to represent the value. Thus, "10" and 10 look just alike.

This is a fundamental problem with representation and value. The value has to be represented. So, how can we deal with the representation and the value, when both of them need to be represented?

One way is to write the representation as a string (as we would in some programming language). We'll put double quotes to make it clear it's a representation. We'll leave the double quotes off when we want to indicate a value.

Also, we'll subscript using words. Thus, 10_ten means ten in base ten. Thus, 10_hex means "one, zero" (we reserve "ten" to mean 10 in base ten) in base sixteen. "hex" is short for hexadecimal, which is base sixteen. We'll write 10_two for "one, zero" in base two.

In general, if a number is subscripted, the subscript is in base ten. Also, if a number is subscripted, we think of it as a value, not as a representation.

Secondly, we'll have a "function" called value. value takes a string, which is a numeric representation, and returns the value, usually in base ten.

Sometimes we'll subscript value with the representation system. Thus, we might write value_UB to mean the value in the UB representation system.

Here are some examples:

  value_ten[ "10" ]  = 10_ten
  value_UB[ "10" ]   = 2_ten
  value_hex[ "10" ]  = 16_ten
  value_UB[ "1111" ] = 15_ten
  value_2C[ "1111" ] = -1_ten

Thus, value_two[ "1111" ] = 15_ten means that the value of representation "1111" is fifteen in base 10.

hex and ten really refer to unsigned base 16 and unsigned base 10 respectively.

UB stands for "unsigned binary". 2C stands for two's complement. Both of these are representation systems.

We'll explore different kinds of representation systems in upcoming lecture notes. So, for now, don't worry about how we came up with the values or what the representation systems mean. You can come back later to read it, and it'll make more sense. All you need to know is that, given a representation and a representation system, you can can determine the value.

repr, the inverse of value

It's useful to have a function that is the inverse of value. That is, we might have the value, but want to know what the representation is in a given representation system. In that case, we use the function repr, which takes a value, and produces a representation.

Here are some examples:

  repr_ten[ 10_ten ] = "10"
  repr_{UB, 2 bits}[ 2_ten ]  = "10"
  repr_{UB, 4 bits}[ 2_ten ]  = "0010"
  repr_hex[ 16_ten ]  = "10"
  repr_{UB, 4 bits}[ 15_ten ]  = "1111"
  repr_{2C, 4 bits}[ -1_ten ]  = "1111"
  repr_{UB, 3 bits}[ 15_ten ]  = overflow

Thus, the representation takes in a "value", usually in base 10, and returns back the string representation in the representation system.

It's also important for most binary representation systems to specify how many bits are used in a representation system. Thus, you see repr_{UB, 2 bits}[ 2_ten ] and repr_{UB, 4 bits}[ 2_ten ] producing different results.

And it may not be possible to represent a value given a representation system and a number of bits. See, the last example above. In UB, 3 bits, you can't represent the value 15.

Arabic Numerals

Arabic numerals (also known as Hindi numerals---these are the conventional numbering system we use) have an intriguing property. You can do math with it.

Think about adding some number like 111 to 123 (in base ten). To do this addition, you add in columns. You add 1+3 to get 4, 1+2 to get 3, and 1+1 to get 2. So, the answer is 234.

You may not have thought much about addition. After all, you learned it when you were very young. However, this kind of addition is a shortcut.

Here's how "true" addition works. Imagine you have a bowl full of beads. Lots and lots of beads. To add two numbers, you count out the number of beads equal to the first number, then continue counting out a second stack of beads. Then, recount the two stacks.

For example, you count 111 beads. Then, you start a second pile, and count out 123 beads. Then, you put both piles together, and count them to get a total of 234. If you're a little clever, you might be able to maintain two counts. Imagine this as two variables totalCount, which is the total count, and localCount which is the count of the second pile.

Before you start counting the second pile totalCount is 111 (i.e., the number of beads you've counted from the first pile). localCount starts at 0.

For each bead you take out of bowl and put in the second pile, you increment both the localCount and the totalCount. When the localCount reaches 123 (which is what you're trying to add), then you stop. The totalCount now has the sum of the two values you want.

Clearly, this way of adding numbers is very slow. The larger the number is, the more work it is. In fact, the amount of work is proportional to the sum of the two numbers you are adding. Adding two numbers that sums to 1000 is 100 times more work than adding two numbers that sum to 10.

Adding using Arabic numerals allows you to add two large numbers by adding the digits of the number (plus carrying values). This usually means about O(log n) small additions, which is much better than O(n) steps of counting.

Similar tricks can be used to compare numbers, multiply numbers, and so forth. Good representations allow you to perform arithmetic conveniently.

This is an important observation. Just try doing math in Roman numerals, and you realize it's a mess. Add 21 to 29, and you have XXI added to XXIX. Doing the addition in Roman numerals involves lots of fancy tricks. How would you do it? Since you've added in Arabic numerals all your life, you'd probably convert the Roman numerals to base 10, add, then convert the result back to Roman numerals. It's much faster that way than figuring out how to add Roman numerals.

If doing arithmetic is so difficult in Roman numerals, how did Romans do arithmetic using Roman numerals? My guess is they used an abacus. If you use an abacus, then doing arithmetic is not based on the representation of Roman numerals, but based on representation of numbers on the abacus itself. Thus, it's pretty much switching representations from Roman numerals to an abacus setting, performing operations on the abacus, and switching it back to Roman numerals.

The convenience of Arabic numerals, and perhaps the main reason it's used, is that you can perform math on the representation itself.

Summary

To conclude, a representation is a way for us to write down a concept. A representation of a value is some symbols we use to convey the abstract notion of a value. Thus, we write "3", but when we see this representation, we visualize three things.

Although we have an idea of what "3" means, do you really have an idea of what 1,000,000 (a million) means? You can write it down. You know how to count to a million, though it would take a long, long time. You can do math with it. But it's really hard to truly conceive of the concept. Imagine 1,000,000 oranges, and it doesn't seem that much different from 999,999 oranges. Can you even imagine a million items?

A mathematician once said that 1,000,000 had no real meaning to him. 3 had a meaning, because he could visualize 3. He couldn't really visualize 1,000,000 of anything and that the only thing 1,000,000 meant was what the properties of math said it meant. That is, he could perform arithmetic on the representation and get correct answers in the representation, but in the end, they were merely representations.

That's a very perceptive view about representations. If the numbers are suitably large, pretty soon all you understand about the number is how to manipulate it. Being able to visualize what the number means abtractly becomes nearly impossible.

Throughout the notes, I will try to keep the notion of representation and value distinct. A representation means something (i.e., it has semantics) only in some representation system. Thus, the same representation can map to different values depending on the representation system.

The real problem with representation and the value the representation has is that values must also be represented. Thus, it's easy to confuse the value with representation. To make it clear which is which, we'll write the representation as a string, and then apply a function value on the representation to get the value.

This function value is really a family of functions. It really takes two arguments: the representation, and the representation system. Thus, we could write: value_2C["1111"] to mean the value of the representation "1111" in the 2C (two's complement) representation.

Similarly, it's nice to be able to compute the representation, given the value. Thus, we have a family of functions called repr. This function has two arguments: the value, and the representation system (usually specifying the number of bits). This function returns a string that is the representation in that number of bits, or overflow if the value can't be represented in that number of bits in that representation.

We've said that "10" (the representation) might map to the idea (value) of ten or two, depending on whether "10" is represented in base ten representation or in base two representation. This is an example of a specific representation (i.e., "10") mapping to two different values. In other words, given a representation like "10", there are many ways to determine the value. The representation system tells us how to map a representation to its value.

Conversely, given a value like -1_ten, there may be different ways to representions for the same value.

  repr_{2C, 4 bits}[ -1_ten ]  = "1111"
  repr_{1C, 4 bits}[ -1_ten ]  = "1111"
  repr_{SM, 4 bits}[ -1_ten ]  = "1001"
  repr_{excess 8, 4 bits}[ -1_ten ]  = "0111"

(Notice that I had to resort to using words like "ten" and "two" to convey the idea of representation---I almost wrote base 10, in which case, it may have made you wonder, is that base ten or base two? "ten" and "two" are also representations, but at least you know "ten" means "ten" whereas 10 may mean "ten" or "two" depending on the representation).

Most people aren't used to binary. If you see 10010111, you're very aware this is a representation. You probably can't figure out what value it is without converting it to base ten! However, write a number in base ten, like 143, and you think that this is a value because of your intuition and experience using base ten, even though it's a representation of a value. Representations and values are two distinct concepts, but it can be easy to think of them as the same thing, when they aren't.

Keep the idea of representation, value, and representation systems in mind as you read about various representations of signed and unsigned ints.

Web Accessibility