March 1996 BYTE Magazine / International Features / Organizing Babylon

Localizing software is not just a multilingual issue; it's also multicultural

Adele Hars

If your first language is French, Italian, German, or Spanish (known collectively in the industry as FIGS), most of the software you use is probably in your native tongue. If your first language is U.K. (or "European") English, you probably still see quite a bit of software in American English, obliging you to overlook z for s and other Americanizations. But the minute you get outside English or the FIGS languages, software in your own language has been an exception to the rule. That is now changing fast, since software companies are taking demand for localized versions much more seriously. Gone are the days when Eastern Europeans or Central Europeans could be lumped into the "they all speak German" category or people from the Nordic countries could be swept into the "their English is so good anyway" category. Gone, too, are the days when software publishers assumed it was easier for the Japanese to speak English than to create Japanese versions of their products.

Most software programs and documentation are first written in English, even by European companies such as Bull, Siemens, Ericsson, SAP, Nokia, and Olivetti. From there, they adapt and translate into the local languages of their major markets.

Last year's market for products and services involved in localizing software was just under $600 million, according to Rose Lockwood, one of the lead authors of the Ovum report, "Globalisation: Creating New Markets with Translation Technology." That figure will quadruple to almost $2.4 billion in the year 2000, Lockwood says.

Serious Business

Software localization is on the verge of becoming serious business as people are demanding not only software in the major language of their country but in the native language of their region. Windows 95, for example, is available in Catalonian and in Basque. Software AG has a project to develop software in Gallic, says internationalization manager Andreas Schutz. But demand for localized software does not stop at the borders of the European Union. Computer users in the countries of the former Soviet Union want software in their own languages, too.

Part of the drive to localize software comes from the changing profile of computer users. No longer is it just highly educated people using sophisticated software. Today's computer users extend across all layers of society and throughout a broad range of professions.

But user demand is just one of the things making companies such as Apple localize into the major dialects of the Indian subcontinent, for example. First and foremost, localization is a tool to sell more copies. Sometimes, however, vendors are pushed by a legal requirement, as in France and Spain, where availability of local versions is the law. And for in-house development it's often a practical decision. For companies that operate worldwide, their corporate ordering systems, for example, must be in the language of the local users.

Localization is not just an issue of software. Medical equipment, automation systems, and manufacturing-control devices are just two examples of product groups that are beginning to rely heavily on software-based user interfaces.

The networking needs of multinational corporations are considerably contributing to the localization rush. How would a marketing manager in a multinational company prepare, and distribute by E-mail, a demo for the sales teams in Korea, Russia, Greece, Singapore, and Finland -- where the languages all use characters that are impossible to write in standard ASCII?Accent, a software company based in Israel, addressed this problem by designing word processing programs that let you deal with multiple languages and alphabets in the same document. The company has also developed multilingual World Wide Web browsers and E-mail add-ons.

The Code-Page Problem

A number of terms describe the various stages in the process of adapting products to local markets. These terms tend to overlap to some extent, but their general usage is the following.

-- Globalization is the umbrella word that covers the entire process of creating a product with versions for users in multiple countries, from the first specification through adaption to local markets. However, some software engineers use this term interchangeably with the word "internationalization."

-- Internationalization covers the behind-the-scenes work that software engineers do to make a program easy to localize. It includes keeping user-interface text strings separate from the rest of the code so that translations won't introduce bugs into previously tested programs, and that text associated with graphics is not part of bit-mapped images.

-- Localization means adapting a product (including software and all the documentation and related marketing material) to a specific language or culture. While some may use this term to include text translation, it also covers making sure that graphics, colors, and sound effects are culturally appropriate, and that things like dates, calendars, measurement units, and monetary notations are in the correct format.

-- Translation refers to the pure adaption of words from the source language to the target language.

Because of the heavy influence of American developers, a lot of software is designed primarily to handle the characters in the Latin-1 alphabet, which covers most of the Western European languages. But as Asmus Freytag, a vice president of the Unicode Consortium and a globalization design consultant, points out, 80 kilometers east of Berlin, you have to change this character set.

The Czech, Slovak, Polish, and Hungarian languages, for example, need the extended characters covered under Latin-2. These languages still require only 1 byte per character, but as soon as you get to the Far East, languages like Japanese require up to 2 bytes to represent characters.

In the past, all these different requirements have been handled on an ad hoc basis, primarily using "code pages" originally developed by IBM and now an ISO standard. Applications and operating systems are designed to support particular code pages that indicate which character sets to use. American English, for example, is code page 437, while Latin-2 is number 852. This means that software designed to run in multiple languages must support multiple code pages.

To put an end to the hassle of supporting multiple code pages, some of the leading hardware and software companies have lined up behind the Unicode initiative as the most comprehensive solution. Unicode is a superset of all Windows code pages (both single-byte and double-byte), as well as of many important international or national character-set standards. It can handle virtually every written language, including some of the most obscure, such as Tibetan. Unicode also covers specialist character sets like the international phonetic alphabet. Originally developed by Apple and Xerox in the late '80s, this fixed-width, 16-bit character set representing more than 65,000 characters is steadily gaining ground.

Operating System Support

What's missing is support for Unicode in major operating systems. Currently, only Windows NT and version 4 of IBM's AIX fully support Unicode. Some localization experts say that Unicode suffered a major setback when Microsoft decided not to support it in Windows 95. Within the next two years, though, several major platforms are expected to incorporate full support for this multicharacter standard. While developers can write Unicode-based software for nonUnicode platforms, it will become much easier when the standard is handled at the operating-system level.

Unicode will soon be incorporated in the Hypertext Markup Language (HTML), and Sun's Web programming language Java is supporting Unicode, too. But only when Unicode becomes the basis for industry-wide development will the goal of global-ready product design be achieved. There is movement in this area, however. "The first round of office-type applications with Unicode support will be available in less than two years," says consultant Freytag. The impact for users in an international environment would be substantial. Multinational corporations, for example, could then transfer files of different applications without worrying about having to maintain code-page dependencies.

Cultural Nuances

The problems of localization are as varied as the cultures they target. There are colors to be avoided: In Russia it's yellow, in Egypt it's blue, "and in Quebec these days it's red and white," says Raphael Barron, a native of Russia and founder/CEO of the localization company, Polyglot. Some cultures associate the pointing-finger cursor with thieves, while others don't accept female voice-overs. Icons with hand gestures of any kind are a potential nightmare for localizers. And clip art of local figures and landmarks are often completely useless or obscure outside their home cultures.

"It's not enough to be multilingual," adds Barron. "You have to be multicultural."

Translation Tools

The biggest, most time-consuming, and most expensive task in software localization remains language translation. In theory, at least, the cheaper and faster it is to translate software and documentation, the faster end-users will be able to obtain localized versions.

The bulk of the translation work today is still done by cottage industry, with top translators able to convert a few thousand words a day. However, a number of emerging tools can potentially help increase production and contribute to consistency and efficiency.

These tools fall into three broad categories. The first type is machine translation (MT), which translates source text automatically, typically in batch mode, according to a set of grammatical rules. The second category is translation memory tools, which are typically interactive programs that offer suggested translations based on previously translated phrases stored in a database. And finally, there are related tools, such as on-line dictionaries and glossaries. More and more, these three groups of tools are being integrated with each other. Major firms such as Oracle, Ericsson, SAP, and Lotus are all utilizing MT engines, according to Jens Thomas Lueck, president of Logos, a company that makes a high-end MT system that uses neural nets.

The latest translation technology includes fuzzy methods, neural networks, and Unicode support. Today's state of the art, Lueck says, is to use translation memory systems when updating existing documentation to filter out the parts that are unchanged. The changed sections, referred to as "the delta," are then candidates for MT. Other localization techniques start with an internal machine translation; the post-editing work is subcontracted out. Experts reckon that the combined usage of different translation tools could automate 50 percent to 70 percent of the translation process.

Multilingual 95

Windows 95 was one of the biggest software localization projects to date. Microsoft released its newest operating system simultaneously in 21 European languages. "Keeping the text strings separate from the code was carved in stone from the beginning," notes Tony Burke, a Microsoft product unit manager responsible for the project.

The translators and coders who were working on Windows 95 made the big August 24th release date for all the so-called first-round languages, including German, French, Spanish, Italian, Swedish, Dutch, Finnish, Danish, Norwegian, and Portuguese. Within four months, they finished other languages, including Czech, Polish, Hungarian, Russian, Slovenian, Greek, Turkish, Catalonian, and Basque. A pan-European version in English, which covers all the fonts for languages into which Windows 95 was not localized (such as Lithuanian, Latvian, and Estonian), is also available now, and a version of Slovakian is underway.

Microsoft and a few other international software developers have learned the importance of customizing products for local languages and customs. It's a lesson more companies are starting to learn.


WHERE TO FIND


Accent Software International Ltd.

Watford, Hertfordshire, U.K.

Phone:  + 44 1923 208235

Fax:    + 44 1923 208230

E-Mail: 74774.264@compuserve.com



Logos

Eschborn, Germany

Phone:  +49 6196 5903-0

Fax:    + 49 6196 5903-15

E-Mail: info@logos.de

Internet: http://www.logos-ca.com



Ovum Ltd.

London, England

Phone:  + 44 171 255 2670

Fax:    + 44 171 255 1995

E-Mail: info@ovum.mhs.compuserve.com



Polyglot

San Francisco, CA

Phone:  + 415 512 8800

Fax:    + 415 512 8982

E-Mail: polyinfo@polyglotint.com



Software AG

Darmstadt, Germany

Phone: +49 61 51/92-0

Fax:   + 49 61 51/92-1191



Unicode Consortium

San Jose, CA

Phone:  + 408 777 5870

Fax:    + 408 777 5082

E-Mail: unicode-inc@unicode.org

Internet: http://unicode.org


Multilingual Windows 95

screen_link (40 Kbytes)

Windows 95 launched in 21 European languages.


Adele Hars is BYTE's Paris correspondent and a technology journalist at the GEID Press Agency. You can reach her at 100325.3703@compuserve.com