| Skip to main content | Skip to Navigation
Indian Goverment
 
What's New
Web Standardization Initiative (WSI)
Media Coverage
Success Stories
Messages
Report Language Computing Issues
Language Technology Players
Language Technology Products
Related Links
Frequently Asked Questions
RTI Act - 2005
 
Indian Language Technology Proliferation and Deployment Centre
India National Portal
Digital India Portal
  Skip Navigation Links Print   Font increase   Font size reset   Font size decrease

VALIDATION OF LANGUAGE TAGS FOR INDIAN LANGUAGES

The language used is by labeling the information content with an identifier or "tag". These tags can be used to specify user preferences when selecting information content, or for labeling additional attributes of content and associated resources. The more content is tagged and tagged correctly through right Tags. Language Tags can be used by the Applications to deliver to users the most appropriate information. It is useful for accessibility, authoring tools, translation tools, font selection, page rendering, search, and scripting and is used by screen readers and accessibility as these applications are interested in output produced by them in different language mode.

Applications of Language Tags

Languages tags used to identify languages whether spoken, written, signed or otherwise for the purpose of communication. Applications, protocols or specifications that use language tags are often faced with the problem of identifying sets of content that share certain language attributes.

::XML- (To spread data into web) 
::Mobile Applications 
::Web (To surf different sites of different languages) 
::MAT (Machine Aided Translation)

The most of the applications of Language Tags are used in every area of Today's World. The major Applications are: 

::Language Tags are required for Context Negotiation. 
::It is also needed for OS based Negotiation.
::Server based Negotiation is major application area of Language Tags.
::Text to Speech (Spoken Form) is another application.
:: Film Industry (Sub Titling of Films) uses Languages Tags as it is required to show the application in various Language.
:: Hacker can also use Language Tags to hack the unsecure website.

Interface with other Standards

:: Relationship between ISO 639 and BCP 47 
BCP 47, describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language used in an information object. It also describes how to register values for use in language tags and the creation of user-defined extensions for private interchange. It is considered an Internet Best Current Practices for the Internet Community and gives guidance for the use of ISO 639 codes. It specifies use of a 2-character code from ISO 639-1 when it exists; when a language does not have a 2-character code assigned the 3-character code is used.
:: Relationship between ISO 639 and ISO 15924 
There's no definitive list associating scripts with languages (because most languages could be written with most scripts, and this actually happens when people are transliterating foreign languages). ISO 639-2 also provides identifiers for groups of languages, such as language families, that together indirectly cover most or all languages of the world. There is a list in ISO 15924 used in Unicode 4.0 in which some Indian Scripts are included. Some more Scripts are added in Unicode 5.0 as well.
::RFC 5646 and RFC 4646 
RFC 5646 describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language used in an information object. Also describes registeration of values for use in language tags and the creation of user-defined extensions for private interchange. This document,replaces RFC 4646 and 4647, replaced RFC 3066, which replaced RFC 1766. For further information : RFC 5646 , RFC 4646

Need for Change:

::Separate language codes used for languages in different scripts.
::Separate language codes are defined for different orthographies.
::Separate language codes defined for Dialects of languages.
::Indic and Indo-European languages are taken separate in ISO 639-2.
::Prakrit languages (Collective languages) in ISO 639-2 are removed partially. But, it is obsolete in ISO 639-3.
::Apbransh is proposed to be included in ISO 639-5. However, it is part of Prakrit.
::Pali includes in ISO 639-1, 2, 3. 

Proposed Language Tag Process:

:: When gathering data for a region and language, it is important to have multiple sources for that data to produce the most widely acceptable data. 
:: Initial versions of data were based on the best available sources, In case of Language Tags in Indian languages, the cencus data (Census 2001) has been taken as the base document.
:: The nomenclature for defining 3 letters set for a particular languages tag may consist of ‘consonants’ of the name of langauges. The vowel should to avoided to the extent possible unless it bcomes extremely necessary. 
:: The draft Language Tag for Indian Languages would be sent to State Governments, Election Commission of India,Chief Secretaries of the States with copies to their Information Technology and Language / Linguistics department for their inputs /suggestions / modifications.
:: The recommended language tag set, after stake holders consultations will be examined for its possible inclusion in ISO 639 with reference to existing Indian language tags already incorporated in ISO 639-1, ISO 639-2, ISO 639-3 and ISO 639-5 which is currently under debate, but the final approval of the release of Indian Langauge Tags is up to the decision of the ISO Commitee. 

Proposed Language Tags for feedback

::Proposed Language Tags 

You can give feedback at any of the following email address

Name: Ms. Swaran Lata
E-mail Id: slata@mit.gov.in

Name: Dr. Somnath Chandra
E-mail Id: schandra@mit.gov.in

Valid XHTML 1.0 Transitional Valid CSS! Level A conformance icon, 
          W3C-WAI Web Content Accessibility Guidelines 1.0   
Website Last Updated on : 15 May 2017