UNICODE IN JAPAN.

A long and detailed explanation of the history of Unicode with respect to Japanese; it’s subtitled “Guide to a technical and psychological struggle” and is very interesting even if Unicode isn’t your thing. The author begins with a brief pre-Unicode history:

Before the arrival of Unicode on the scene, the Japanese government produced various standard lists of characters for various different purposes. Three government departments (the ministries of industry, culture and justice) have been involved with creating character sets. In order to understand the decisions made by these departments, it is necessary to bear in mind that the Japanese language was dramatically simplified and reorganized after World War 2, and for some decades thereafter the aim of Japanese language standards was to change and simplify the language, not to describe it.

During the 19th century, the number of kanji required for literacy in Japan was perhaps about 4,000. Even at that time, there were many people calling out for a rationalization and pruning of the writing system. In the early 20th century, the Ministry of Education (now the Ministry of Culture) issued a list of common kanji and a new kana system. Newspapers also announced their own plans of restricting kanji to some sensible subset (although these subsets appear very large and baroque by modern standards). However, opposition from traditionalists effectively postponed reform until after World War 2. In 1945, the Yomiuri newspaper announced that the abolition of kanji would now finally be possible, which at the time wasn’t too extreme a position—others were advocating the total abandonment of the Japanese language!

Japanese character sets as we know them, therefore, have arisen from a background of rapid change and strong reformism.

It goes on to many other topics, including this excursus on personal names:

There is one interesting property of Japanese names that, while not directly relevant, sometimes gets thought of as a character set issue. Most Japanese people have a hanko, a seal which has the individual’s name carved on it and works like a signature. To be valid on legal documents, a hanko must have a certain level of complexity and uniqueness. The same variants of the same characters written in the same style still won’t count as a signature; the exact precise glyph (including wear and damage) that appears on the hanko is the one that constitutes the individual’s signature. Therefore, not merely a character and a variant but an actual glyph is recorded for many Japanese people’s names—a unique situation. Luckily, character sets are not concerned with particular glyphs (except possibly Mojikyo) so this issue does not affect us.

They also use those seals in Taiwan; I wish I knew what happened to mine, since it produced a very handsome impression.

(Via MetaFilter and No-sword.)

Comments

  1. I’m surprised that even a capsule summary of Unicode in Japan could apparently fail to mention Xerox. I have a copy of XNSS 058605 right here on the shelf next to Unicode v1.0 (still an ordinary sized book). See, for instance, Ten Years of Unicode 1988 – 1998.
    This blog, by a Microsoft Unicode tech lead, has a collection of bite-sized posts on these collisions of language tradition, history, politics, and character encoding, entitled Every Character Has a Story.

  2. It’s amazing how easy it is to get passionate about something abstruse like character encoding standards. This gentleman, for example, has built up a website dedicated to the TRON system over the last 10 years and presents a more partisan view.
    My own passionate gripe about this essay is that he doesn’t really emphasize the radical nature of Unicode-thought, in which, unlike previous systems, the Unicode representations (called code points) are better thought of, as Joel Spolsky calls them, as “platonic ideals…floating in heaven.”

  3. John Cowan says

    Alas, Michael Kaplan’s blog is deceased. As is he (complications of multiple sclerosis).

  4. I’m sorry to hear it; thanks for providing the author’s name, which I inexplicably failed to do (I’m usually scrupulous about attribution).

Speak Your Mind

*