Dept. of IT | Hindi SiteContact us | Sitemap
 

TDIL

 
 
 
 
 
 
 
 
 
 
 
 

UNICODE
Click here for :

What is UNICODE?

Unicode is increasing being accepted as a standard for Information Interchange worldwide as most of the major IT Companies have declared their support for it. Unicode for Indian Languages use ISCII-88 and not ISCII-91 which is the latest official standard. It was felt necessary that Indian Government should represent UNICODE Consortium for necessary modification in the code pertaining to Indian languages script and hence Department of Information Technology became full member of Unicode Consortium with voting right.

16 Bit (2 Byte) UNICODE
Unicode standard is the Universal character encoding standard, used for representation of text for Computer Processing. Unicode standard provides the capacity to encode all of the characters used for the written languages of the world. The Unicode standards provide information about the character and their use. Unicode Standards are very useful for Computer users who deal with multilingual text, Business people, Linguists, Researchers, Scientists, Mathematicians and Technicians. Unicode uses a 16 bit encoding that provides code point for more than 65000 characters (65536). Unicode Standards assigns each character a unique numeric value and name. The Unicode standard and ISO10646 Standard provide an extension mechanism called UTF-16 that allows for encoding as many as a million. Presently Unicode Standard provide codes for 49194 characters.

What is unicode policy for character encoding?

Unicode consortium has laid down certain policy regarding character encoding stability by which no character deletion or change in character name is possible only annotation update is possible
1. Once a character is encoded, it will not be moved or removed.
2. Once a character is encoded, its character name will not be changed.
3. Once a character is encoded, its canonical combining class and decomposition (either canonical or compatibility) will not be changed in a way that would affect normalization.
4. Once a character is encoded, its properties may still be changed, but not in such a way as to change the fundamental identity of the character.
5. The structure of certain property values in the Unicode character database will not be changed.

What is the basic difference between Unicode and ISCII code?
Unicode uses a 16 bit encoding that provides code point for more than 65000 characters (65536). Unicode Standards assigns each character a unique numeric value and name. Unicode standard provides the capacity to encode all of the characters used for the written languages of the world.
ISCII uses 8 bit code which is an extension of the 7 bit ASCII code containing the basic alphabet required for the 10 Indian scripts which have originated from the Brahmi script. There are 15 officially recognized languages in India. Apart from Perso-Arabic scripts, all the other 10 scripts used for Indian languages have evolved from the ancient Brahmi script and have a common phonetic structure, making a common character set possible. The ISCII Code table is a super set of all the characters required in the Brahmi based Indian scripts. For convenience, the alphabet of the official script Devnagari has been used in the standard.

Recommendations of DIT, Ministry of Communications & Information Technology in the Unicode Standard for proper representation of Indic Scripts.

Unicode 3.0 includes standard code sets for Indic scripts based on ISCII-1988 document. Present national standard is ISCII:1991 (Indian Script Code for Information Interchange-ISCII- IS13194:1991). Some modifications are necessary to incorporate in the Unicode Standard for proper representation of Indic Scripts.

The ministry in deliberations with Industry, Academia & R&D Institutions has finalized the proposed changes for all the Indian languages. The draft of the proposed changes were brought out in TDIL's Newsletter Vishwabharat@tdil issues which are given below.

Devanagari (Newsletter Jan 2002 pdf)
(For Devanagari & Devanagari based languages)

Gujarati, Malayalam (Newsletter April 2002 pdf) (For Gujarati & Malayalam)

Oriya, Gurmukhi & Telugu (Newsletter April 2002 pdf) (For Oriya, Gurmukhi & Telugu)

Bangla (Newsletter July 2002 pdf) (For Bangla & Bangla based languages)

Tamil, Kannada (Newsletter Oct 2002 pdf)
(For Tamil & Kannada)

Arabic-Urdu, Sindhi, Kashmere (
Newsletter Oct 2002 pdf)
  (For Arabic-Urdu, Sindhi,                                                                                                     Kashmere)

Vedic (Newsletter Oct 2002 pdf) (For Vedic Sanskrit)

You must have the latest version of Acrobat Reader to read these PDFs