Character Set in Java

A character set refers to the composite number of different characters that are being used and supported by a computer software and hardware.

Almost all computer systems and languages use the ASCII character encoding. The ASCII code represents each character using 8 bits (that is, one byte) and there are 256 different characters available. Several of these are "control characters."

Java, however, uses 16 bits (that is, 2 bytes) for each character and uses an encoding called Unicode. The first 256 characters in the Unicode character set correspond to the traditional ASCII character set, but the Unicode character set also includes many unusual characters and symbols from several different languages.

Typically, a new Java program is written and placed in a standard ASCII file. Each byte is converted into the corresponding Unicode character by the Java compiler as it is read in. When an executing Java program reads (or writes) character data, the characters are translated from (or to) ASCII. Unless you specifically use Unicode characters, this difference with traditional languages should be transparent.

To specify a Unicode character, use the escape sequence \uXXXX where each X is a hex digit. (You may use either uppercase A-F or lowercase a-f.)
Non-ASCII Unicode characters may appear in character strings or in identifiers, although this is probably not a good idea. It may introduce portability problems with operating systems that do not support Unicode fonts. The Unicode characters are categorized into classes such as "letters," "digits," and so forth.


Post a Comment

Popular Posts

Enable JMX Port in Tomcat with authentication

Enable Clustering in Openfire

Tomcat SSL/TLS Configuration | Run tomcat over HTTPS