How Hard is Adding Integers for a Computer?


In our modern world, we often use high level programming languages (Python, Ruby, etc) without much thought about what is happening. Even if we use a low level language like C, we still probably think of operations like {1+1} yielding {2} or {3-2} yielding {1} as extremely basic. We have no appreciation for how subtle and clever people had to be to first get those types of things to work.

I don’t want to go into detail about how those actually work at the machine level, because that would be a pretty boring post. I do want to do a thought experiment that should bring up some of the issues. Suppose you want to make a new programming language to see how one does such a thing. You think to yourself that it will at least be able to add and subtract integers. How hard could that be?

To play around a little you decide you will first make an integer take up 4 bits of memory. This means when you declare an integer {x = 1}, it gets put into a space of size {4} bits: {0001}. Each bit can be either a {0} or a {1}. Things seem to be going great, because you are comfortable with binary notation. You think that you’ll just take an integer and write its binary representation to memory.

Just for a quick refresher, for 4 bits this means that your integer type can only encode the numbers from {0} to {15}. Recall that you can go back to base {10} by taking each digit and using it a coefficient on the appropriate power of {2}. Thus {0101} would be {0\cdot 2^3 + 1\cdot 2^2 + 0\cdot 2^1 + 1\cdot 2^0 = 5}.

Things are going well. You cleverly come up with an adding function merely by manipulating bits with allowable machine level operations coming directly from the operating system. Then you test {15 + 1}. Woops. You overflow. This is the first problem, but it isn’t the interesting one I want to focus on. Even if you have a well defined integer type and a working addition function, this doesn’t mean that adding two integers will always result in an integer! There is an easy rule you think up to determine when it will happen and you just throw an error message for now.

Now you move on to subtraction. Oops. You then realize that you have no way of representing negative numbers with your integer type. If you haven’t seen this before, then you should really take a few moments to think about how you would do this. The “most obvious” solution takes some thought, and turns out to be terrible to use. The one that people actually use is quite clever.

The first thing you might try is to just reserve either the first or last bit as a way to indicate that you are positive or negative. Maybe you’ll take {1xxx} to be negative and {0xxx} to be postive. For example, {0001} is {1} and {1001} is {-1}. First, notice that this cuts the number of postive integers you can represent in half, but there isn’t a way around this. Second, there is a positive and negative “0” because {1000} is supposedly {-0}. This will almost certainly cause a bigger headache than it is solves.

Lastly, that adding function you wrote is meaningless now. Fortunately, people came up with a much better solution. It is called two’s complement notation. We just weight the most significant bit with a negative. This means that {1010} would convert to {-2^3 + 0\cdot 2^2 +2^1 + 0\cdot 2^0 = -6}. This makes all the numbers that start with 1 negative like our earlier example, except there is only a single 0 now because {1000} is {-8} (our most negative integer we can represent).

Moreover {3-2 = 3 + (-2) = 0011 + 1110 = 0001 = 1} (if we chop off overflow, yikes). So plain old addition works and gives us a subtraction. Except, sometimes it doesn’t. For example, take {0111 + 0001 = 1000}. This says that {7+1= -8}. This is basically the same overflow error from before, because {8} is not an integer that can be represented by our 4 bit type. This just means we have to be careful about some edge cases. It is doable, and in fact, this is exactly what C does (but with 32 bit integers).

Just to wrap up, it seems that to make this hobbled together solution of merely representing and adding integers work we want to make sure that our language is strongly typed (i.e. we know exactly how big an integer is so that we know where to place that leading 1 indicating a negative and the type isn’t going to change on us).

Just consider if we tried to prevent overflow issues by making a “big integer” class that is {8} bits instead of {4}. We try to do {3-2} again, and upon overflow we switch the type to a big int. We would then get {3-2= 0001 \ 0000} which is {16}. This means we have to be really careful when dealing with multiple types and recasting between them. It seems a minor miracle that in a language like Ruby you can throw around all sorts of different looking types (without declaring any of them) with plus signs between them and get the answer you would expect.

That brings us to the main point of this post. It is a really, really good thing we don’t have to worry about these technicalities when writing programs. The whole point of a good high level language is to not be aware of all the tedious machine level computations going on. But this also means that most people have no appreciation for just how complicated something as simple as adding two integers can be (of course, this is all standardized now, so you probably wouldn’t even worry about it if you were writing your own language from scratch).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s