Share |
Login Form
Newsletter



Receive HTML?

Latest Members


ASCII, EBCDIC, ISO, and Other Computer Codes (Part 2) Hot

 
User rating
 
0.0 (0)

Digital computers represent and manipulate all forms of data as a collection of numerical values. In the case of text, alphanumerical and punctuation characters are each assigned a numerical equivalent, and this collection of character-to-number mappings is referred to as a "code." In this, the second installment of our mini-series, we take a look at the ASCII convention.

Just to refresh our memories: In Part 1 we considered the concept of codes in general; in Part 3 we will ponder EBCDIC character codes; and in Part 4 we will consider "Chunky Graphics Codes," ISO and Unicode, and also provide links to some suggested further reading.

The ASCII Code
Towards the end of the 1950s, the American Standards Association (ASA) began to consider the problem of defining a standard character code mapping that could be used to facilitate the representation, storing, and interchanging of textual data between different computers and peripheral devices. In 1963, the ASA – which changed its name to the American National Standards Institute (ANSI) in 1969 – announced the first version of the American Standard Code for Information Interchange (ASCII).

However, this first version of the ASCII code (which is pronounced "ask-key") left many things – such as the lower case Latin letters – undefined, and it wasn't until 1968 that the currently used ASCII standard of 96 printing characters and 32 control characters was defined as illustrated in Figure 2-1.

The 1968 version of the ASCII code

Figure 2-1. The 1968 version of the ASCII code.
(Dollar ‘$’ Characters indicate hexadecimal values)

Note that code $20 (which is annotated "SP") is equivalent to a space. Also, as an aside, the terms uppercase and lowercase were handed down to us by the printing industry, from the compositors' practice of storing the type for capital letters and small letters in two separate trays, or cases. When working at the type-setting table, the compositors invariably kept the capital letters and small letters in the upper and lower cases, respectively; hence, "uppercase" and "lowercase." Prior to this, scholars referred to capital letters as majuscules and small letters as minuscules, while everyone else simply called them capital letters and small letters.

We should also note that one of the really nice things about ASCII is that all of the alpha characters are numbered sequentially; that is, 65 ($41 in hexadecimal) = 'A', 66 = 'B', 67 = 'C', and so on until the end of the alphabet. Similarly, 97 ($61 in hexadecimal) = 'a', 98 = 'b', 99 = 'c', and so forth. This means that we can perform cunning programming tricks like saying "char = 'A' + 23" and have a reasonable expectation of ending up with the letter 'X'. Alternatively, if we wish to test to see if a character (called "char") is lowercase and – if so – convert it into its uppercase counterpart, we could use a piece of code similar to the following:

if (char >= 'a') and (char <= 'z') then char = char – 32;

Don't worry as to what computer language this is; the important point here is that the left-hand portion of this statement is used to determine whether or not we have a lowercase character and, if we do, subtracting 32 ($20 in hexadecimal) from that character's code will convert it into its uppercase counterpart.

As can be seen in Figure 2-1, in addition to the standard alphanumeric characters ('a'...'z', 'A'...'Z' and '0'...'9'), punctuation characters (comma, period, semi-colon, ...) and special characters ('*', '#', '%', ...), there are an awful lot of strange mnemonics such as EOT, ACK, NAK, and BEL. The point is that, in addition to representing textual data, ASCII was intended for a number of purposes such as communications; hence the presence of such codes as EOT, meaning "End of transmission," and BEL, which was used to ring a physical bell on old-fashioned printers.

Some of these codes are still in use today, while others are, generally speaking, of historical interest only. For those who are interested, a more detailed breakdown of these special codes is presented in Figure 2-2.

ASCII control characters/codes

Figure 2-2. ASCII control characters.

One final point is that ASCII is a 7-bit code, which means that it only uses the binary values %00000000 through %01111111 (that is, 0 through 127 in decimal or $00 through $7F in hexadecimal). In some systems, the unused, most-significant bit of an 8-bit byte representing an ASCII character is simply set to logic 0. Alternatively, this bit might be used to implement a form of error checking known as a parity check, in which case it would be referred to as the parity bit.

There are two forms of simple parity checking, which are known as even parity and odd parity. Let's suppose that one computer system is transmitting a series of ASCII characters to another system. In the case of even parity, the transmitting system counts the number of logic 1s in the ASCII code for the first character. If there are an even number of logic 1s, the most-significant bit of the 8-bit byte to be transmitted will be set to logic 0; but if there are an odd number of logic 1s, the most-significant bit will be set to logic 1.

This process will be repeated for each character as it is being transmitted. The end result of an even parity check is to ensure that there are always an even number of logic 1s in the transmitted value; by comparison, an odd parity check ensures an odd number of logic 1s in the transmitted value.

Similarly, when the receiving computer is presented with a character code, it counts the number of logic 1s in the first seven bits to determine what the parity bit should be. It then compares this calculated parity bit against the transmitted parity bit to see if they agree.

This form of parity checking is just about the simplest form of error check there is. It will only detect a single-bit error, because two errors will cancel each other out. (Actually, to be more precise, this form of parity check will detect an odd number of errors – one bit, three bits, five bits, and so on – but an even number of errors will cancel each other out.)

Furthermore, even if the receiving computer does detect an error, there's no way for it to tell which bit is incorrect (indeed, the main value could be correct and the parity bit itself could have been corrupted). A variety of more sophisticated forms of error checking can be used to detect multiple errors, and even to allow the computer to work out which bits are incorrect, but these techniques are outside the scope of these discussions.

OK, that's enough about ASCII for the moment. In Part 3 of this mini-series we will consider another well-known code called EBCDIC.

User reviews

There are no user reviews for this listing.

To write a review please register or login.
 
 
 
Written by :
Clive Maxfield
 
 






Latest Content
User rating
 
0.0 (0)