Share |
Login Form
Newsletter



Receive HTML?

Latest Members


ASCII, EBCDIC, ISO, and Other Computer Codes (Part 1) Hot

 
User rating
 
0.0 (0)

Digital computers represent and manipulate all forms of data as a collection of numerical values. In the case of text, alphanumerical and punctuation characters are each assigned a numerical equivalent, and this collection of character-to-number mappings is referred to as a "code." In this, the first installment of our mini-series, we take a look codes in general.

In Part 2 we will take a look at the ASCII convention; in Part 3 we will ponder EBCDIC character codes; and in Part 4 we will consider "Chunky Graphics Codes," ISO and Unicode, and also provide links to some suggested further reading.

A Morass of Confusion
There's an old engineering joke that says: "Standards are great ... everyone should have one!" The problem is that – very often – everyone does. Consider the case of storing textual data inside a computer, where the computer regards everything as being a collection of numbers. In this case, someone has to (a) decide which characters need to be represented in the first place and (b) decide which numeric values are going to be associated with the various characters.

Suppose, for example, that we wish to represent only the uppercase letters 'A' through 'Z', in which case we will require only 26 different codes. Now, five binary bits can represent 2^5 = 32 different patterns of 0s and 1s, so a 5-bit code would be sufficient to represent our 26 uppercase letters with some combinations left over. And which number-to-letter mapping would we employ? Well, in this case we would probably go with the obvious choice of assigning 'A' to %00000, 'B' to %00001, 'C' to %00010, and so forth up to 'Z' being assigned to %11001 as illustrated in Figure 1-1 (where the '%' character is used to indicate a binary number).

A simple character code mapping scheme

Figure 1-1. Mapping only 26 uppercase letters to binary codes is simple.
('%' indicates a binary value; values shown in parentheses are decimal equivalents)

However, what if we decide that we also wish to represent the lowercase characters 'a' through 'z'? In this case we have 26 + 26 = 52 different characters, so we'd have to use a 6-bit code because we need at least 2^6 = 64 different patterns of 0s and 1s. But now we have a problem: should the codes %000000 through %011001 be used to represent 'A' through 'Z' and the codes %011010 through %110011 be used to represent 'a' through 'z', or vice versa, as illustrated in Figure 1-2?

Another simple character code mapping scheme

Figure 1-2. Two alternative mappings for uppercase and lowercase letters.
('%' indicates a binary value; values shown in parentheses are decimal equivalents)

Alternatively, would it make more sense to use %000000 to represent 'a', %000001 to represent 'A', %000010 to represent 'b', %000011 to represent 'B', and so forth? In this scenario, even numbers represent lowercase characters and odd numbers represent their uppercase counterparts as illustrated in Figure 1-3.

Yet another possible character code mapping scheme

Figure 1-3. Another mapping possibility for uppercase and lowercase letters.
('%' indicates a binary value; values shown in parentheses are decimal equivalents)

And there are many more options open to us; for example, %000000 through %011001 could be used to represent 'a' through 'z', while %100000 through %111001 could be used to represent 'A' through 'Z' as illustrated in Figure 1-4.

Yet one more character code mapping scheme

Figure 1-4. Yet another mapping alternative for uppercase and lowercase letters.
('%' indicates a binary value; values shown in parentheses are decimal equivalents)

In this case, the most-significant (left-hand) bit tells us if we have a lowercase (0) or uppercase (1) character, while the remaining bits are used to inform us what the character is; the only difference between the codes for 'a' and 'A', for example, is a 0 or 1 in the most-significant bit.

Of course, in addition to letters, we will almost certainly wish to represent the decimal number characters '0' through '9', along with some punctuation characters such as commas, periods ("full stops"), parentheses ("brackets"), single and double quotes, question and exclamation marks ("bangs" or "shrieks", and so forth.

So, who actually gets to decide which characters are going to be represented and which numeric values are going to represent which characters? Well, in the early days of computing, almost anyone (or any company) who designed a computer got to define their own character code mapping.

It's easy to see how this could lead to problems; for example, consider peripheral devices such as printers. These days we can wander down to our local computer store, purchase a cheap-and-cheerful printer, return home, connect it to our computer (often via a wireless connection), and instruct the computer to print out a document ... and everything works (hurray)! And one of the reasons it works is that both the computer and the printer "talk the same language" (understand the same character code mapping).

But in the days of yore, when computer designers defined their own character code mappings, they also had to construct all of their own peripheral devices such as printers, which was a time-consuming and expensive hobby. And, of course, things only became more complicated when folks started to connect disparate computers from different manufacturers. Something had to be done, and the obvious solution was for everyone to adopt a standard code...

...which leads us to Part 2 of this mini-series...

User reviews

There are no user reviews for this listing.

To write a review please register or login.
 
 
 
Written by :
Clive Maxfield
 
 






Latest Content
User rating
 
0.0 (0)