Internationalization is a process of producing an application platform which can be capable of being localized for any cultural environment. This application might be developed earlier for some cultural environments and languages.
In countries like Japan, China, Korea, Thailand etc. English is the second language, hence there is a significant demand for the development of IT systems in the local language to be used by common people.
Almost all applications of IT have the local language support such as:
Billing Businesses: Taxes, fees for public services, Bank transaction statements
Online processing: Railway reservation, ATMs in banks, library catalog searching
Internet: E-mail, web browsers in development
Systems: Mostly PCs with DOS/Windows, UNIX machines and Machine frames.
User interface/Local: It is impossible to sell PC based language support product without Japanese support.
Popular packages: MS word, Lotus1-2-3, Oracle etc.
Packages without Japanese support: Specialized software like FTP.
Main Issues addressed/ and the approach for Localization:
Standardization of Characters and cultural conventions: Constitution of JIS (Japanese Industrial Standards) Board The s/w developers will contact the JIS before designing the s/w package for standardizing the terminology/conventions.
JIS X0201 7-bit and 8-bit character sets: First version came in 1969, second version in 1976 and the latest revision is in 1997.
Based on Latin alphabets which define 63 Kata-Kana characters, JIS X0208 7-bit and 8-bit double byte KANJI characters. First version in 1978, revised in 83, 90 and final in 97 Defines the combined character set, Kanji and also non Kanji. It also defines shifted coded expression (Shift JIS)
JIS X 0212 Supplementary Graphic characters: in 1990 Established in 1990. 5,801 Kanji characters and 245 non-kanji characters are defined.
JIS X0221 Universal Multiple-Octet coded character set. First version established in 1993 and Finally in95 This includes all the characters defined by JISX0201, 208 and 212. This can support 20,902 characters out of about 34,000 characters in total.
Code characters sets are different from product to Product and platform to platform. Code conversion programs/overlays exist for all.
Major US suppliers are now supplying APIs for multilingual information processing. Multicharacter sets are now available like Netscape to be used for more than one language.
Other Issues such as:
Rendering of mixed scripts, fonts, text layouts etc. Sorting/ordering of characters NLP Still contemplating.
There two character sets: one is the traditional and the other is simplified. Simplified set is being used in urban areas of china. Traditional set is being used in Hong-kong, Thaiwan.
The first GB character set was established in 1988. This is a 7bit character coding scheme. GB2312 is a simplified character set, digits and also Latin & Greek alphabets.
Two byte character set is being used in DOS, Windows, and Open Windows
The latest version is in 1995 ISO/IEC 10646
For Internet RFC1922 'Chinese Character Encoding for Internet Messages"
Status of Localization: Most of the applications for common man like WP, DB, email etc. are available
Character display is Four levels. Earlier Character code is designed based on Display. Realized need for separate Overlays for NLP.
Two types of Standards. Most of the systems support both the codes
Status of Localization: Office Systems, DBMS available with Thai support.
Authoring tools for CAI: Author ware, Toolbook Commercial OCRs for 90% accuracy.
WordProcessing, Spread sheets, DBMS are localized in DOS, Windows3.X and Windows 95 platforms. A few MT systems Japan-Korea, English-Korea, have been developed for limited vocabularry.
Three Codes are popular Wansung, Johab, Han. A separate code for the Web, email etc.
Two types of Keyboards: KS and Hung.
Speech processing is only at lab level. Speech synthesis is still far off from humans
Mangolia, Vietnam, Brunei, Singapore etc.
Realized the need for localization a few years ago Already localized word processors available
Brunei expressed their problem in representing JAWI.
What level of localisation do we require?
DIR --- 'SOOCHI'
COPY --- 'NAKAL'
WINDOW ---- 'KHIDIKI'
Not at this moment
What are the expectations?
IT Expert -- Able to develop application system in the language selected by the client.
Eg. A financial statement/database for inventory/ Employees in a company etc. can be developed by the programmer in English. The package should allow him to design the I/O screens, Data entry forms, Menus queries/outputs etc. in the language of choice.
Word-processing: Preparation of letters etc. in local/ regional language.
Applications: Inventory/payroll/fin.accounting etc. for entrepreneurs. Is it possible to develop these in one lang. (Eng.) and the user can access in another IL.? Transliteration does help?
Post offices: Registration/Speed post/M.O.s in local languages.
Word processing and Spread sheets with local lang. Menus may be retained in Eng. but transliterated.
Issues: Language/Cultural diversification 18 languages with 10 scripts
Hindi being National Lang. and 40% population are Hindi speakers
Is it worth develop all applications in Hindi in its first phase and then in other languages.?
Only about 10-20% of IT communities are from Hindi
People Working in Lang. technology/NLP for IL are getting only second class treatment in IT
Standardization of Characters: Perusal of international scenes reveals that many countries have devised the code keeping in view the display. Later realized the problem of CV separation and revised the code for characters.
Three levels are better, Level1: Basic Characters, Level2: Rendering rules/overlays for Display, Level3: Rendering rules/overlays for Speech/NLP
Standardization of Cultural conventions: For each region/language we should have a standard way of representing the following: Month/date/year/era formats, week days, Currency representations, Measurements, Units Different symbols, numbers etc. Names, Transliteration rules.
Level 1: Low level working group Tasks: Preparation of concatenation rules for Transliteration, cultural conventions
Level 2: High level - Policy making level Decision Making, Protocol Maintenance with S/w developers and MNCs.
Identify resource centres/Language Engg. Centres
* Needed for alpha, beta testing.
* R&D/ Productionization, Market survey
* Customer support etc.
* Frequently interacting with customers and give feedback to the Standards Committee.