Victor Mair at Language Log has posted about a new dictionary that ups the ante in the East Asian contest “to see who can produce a dictionary with the most entries”:
The Koreans at Dankook University have just pulled off the amazing feat of compiling a dictionary that has outstripped anything yet generated by the Japanese or the Chinese themselves. After 30 years of labor and investing more than 31,000,000,000 KRW (equal to more than 25 million USD), the South Koreans have just published the Chinese-Korean Unabridged Dictionary in 16 volumes. This humongous lexicon contains nearly half a million entries composed of 55,000 different characters.
Which is interesting in itself, but I’m linking to the entry for Victor’s discussion of why “there will never be an end to the compilation of ever larger single character dictionaries, since the Chinese writing system is essentially open-ended” and why it’s pointless to try to accumulate as many characters as possible: “most of the characters in these mega-dictionaries can only be attested as having occurred once in history, and that often in lexicons of obscure characters!” There’s a very interesting graph of “number of characters” versus “rate of coverage” that shows that 6,600 characters cover 99.999% of what’s found in actual text, which means a massive compilation like the Zhonghua zihai 中華字海, with over 85,500 different characters, is an exercise in overkill.