|
|
UNICODE
| Click here for : |
 |
Unicode is increasing being accepted as a standard for
Information Interchange worldwide as most of the major
IT Companies have declared their support for it. Unicode
for Indian Languages use ISCII-88 and not ISCII-91 which
is the latest official standard. It was felt necessary
that Indian Government should represent UNICODE Consortium
for necessary modification in the code pertaining to Indian
languages script and hence Department of Information Technology
became full member of Unicode Consortium with voting right.
16 Bit (2 Byte) UNICODE
Unicode standard is the Universal character encoding standard,
used for representation of text for Computer Processing.
Unicode standard provides the capacity to encode all of
the characters used for the written languages of the world.
The Unicode standards provide information about the character
and their use. Unicode Standards are very useful for Computer
users who deal with multilingual text, Business people,
Linguists, Researchers, Scientists, Mathematicians and
Technicians. Unicode uses a 16 bit encoding that provides
code point for more than 65000 characters (65536). Unicode
Standards assigns each character a unique numeric value
and name. The Unicode standard and ISO10646 Standard provide
an extension mechanism called UTF-16 that allows for encoding
as many as a million. Presently Unicode Standard provide
codes for 49194 characters.
Unicode consortium has laid down certain policy regarding
character encoding stability by which no character deletion
or change in character name is possible only annotation
update is possible
1. Once a character is encoded, it will not be moved or
removed.
2. Once a character is encoded, its character name will
not be changed.
3. Once a character is encoded, its canonical combining
class and decomposition (either canonical or compatibility)
will not be changed in a way that would affect normalization.
4. Once a character is encoded, its properties may still
be changed, but not in such a way as to change the fundamental
identity of the character.
5. The structure of certain property values in the Unicode
character database will not be changed.
Unicode uses a 16 bit encoding that provides
code point for more than 65000 characters (65536). Unicode
Standards assigns each character a unique numeric value
and name. Unicode standard provides the capacity to encode
all of the characters used for the written languages of
the world.
ISCII uses 8 bit code which is an extension of
the 7 bit ASCII code containing the basic alphabet required
for the 10 Indian scripts which have originated from the
Brahmi script. There are 15 officially recognized languages
in India. Apart from Perso-Arabic scripts, all the other
10 scripts used for Indian languages have evolved from
the ancient Brahmi script and have a common phonetic structure,
making a common character set possible. The ISCII Code
table is a super set of all the characters required in
the Brahmi based Indian scripts. For convenience, the
alphabet of the official script Devnagari has been used
in the standard.
Recommendations of DIT, Ministry
of Communications & Information Technology in the
Unicode Standard for proper representation of Indic Scripts.
Unicode 3.0 includes standard code sets for Indic scripts
based on ISCII-1988 document. Present national standard
is ISCII:1991 (Indian Script Code for Information Interchange-ISCII-
IS13194:1991). Some modifications are necessary to incorporate
in the Unicode Standard for proper representation of Indic
Scripts.
The ministry in deliberations with Industry, Academia
& R&D Institutions has finalized the proposed
changes for all the Indian languages. The draft of the
proposed changes were brought out in TDIL's Newsletter
Vishwabharat@tdil issues which are given below.
Devanagari (Newsletter Jan 2002
pdf) (For Devanagari
& Devanagari based languages)
Gujarati, Malayalam (Newsletter
April 2002 pdf) (For Gujarati
& Malayalam)
Oriya, Gurmukhi & Telugu (Newsletter April 2002 pdf)
(For Oriya, Gurmukhi & Telugu)
Bangla (Newsletter
July 2002 pdf) (For Bangla
& Bangla based languages)
Tamil, Kannada (Newsletter Oct 2002 pdf)
(For Tamil
& Kannada)
Arabic-Urdu, Sindhi, Kashmere (Newsletter Oct 2002 pdf) (For
Arabic-Urdu, Sindhi,
Kashmere)
Vedic (Newsletter Oct 2002
pdf) (For Vedic
Sanskrit)
|
|