Contents
- Index
- Previous
- Next
Unicode Character Set
Unicode is a 16-bit character set designed to cover all the world's major living languages, in addition to scientific symbols and dead languages that are the subject of scholarly interest. It eliminates the complexity of multi-byte character sets that are currently used on UNIX and Windows to support Asian languages. Unicode was created by a consortium of companies including Apple, Microsoft, HP, Digital and IBM and merged its efforts with the ISO-10646 standard to produce a single standard in 1993. Unicode is the basis for the versions of Windows that follow from Windows NT: Windows 2000, XP, and Vista. Concordance is not a Unicode program. Because it is written to work on all versions of Windows, it uses the ANSI (Windows) character set. Unicode files need conversion before they can be used.
Unicode is a 16-bit character set where all characters occupy the same space. The first 256 values are the same as the ISO-Latin character set, which is also the basis for the ANSI Character set used in Windows 3.1 and Windows 95. But Unicode goes on to define 34,168 distinct coded characters. In most character sets a single value is often assigned to several characters. For example, in ASCII a "-" is used to represent a hyphen, a minus sign, a dash and a non-breaking hyphen. In Unicode each meaning is given its own code. The Unicode standard contains only one instance of each character and assigns it a unique name and code value. It also supports "combining" accent characters, which follow the base character that they are to modify.
For more information on Unicode, visit the Unicode Web Site.
______________________________________________________________________________