One of Languagehat’s favorite lexicographers, Erin McKean, has a post at the NY Times Opinionator blog expanding on her ideas about the dictionary not being the be-all and end-all of the lexicon (see this 2006 LH post), including a startling statistic with which I’ll begin my excerpt:
Scholars recently analyzed more than five million digitized books, about 4 percent of all the books ever printed. Publishing their findings in “Science,” the researchers discovered that, by their estimation, “52 percent of the English lexicon – the majority of the words used in English books – consists of lexical ‘dark matter’ undocumented in standard references.” Some of the undictionaried words in the article were more or less morphologically transparent ones, like aridification or deletable, but others, like slenthem (a musical instrument), can’t be puzzled out from recognizable roots.
Writers constantly add to the lexical dark matter of the linguistic universe, either by writing about things so new that the terms used to discuss them are still hot from the mold, or just through pure wordsmithery, the coining of words that need to exist for evocative, rather than technical, reasons….
Even words that seem as if they would have been around for the dawn of the language can be traced back to writers who felt a need for them and didn’t stop to do an existence proof: Samuel Taylor Coleridge used the word agasp (meaning “eager”) way back in 1800. Emily Dickinson is cited for resituate more than 80 years before it was found in the Lubbock Morning Avalanche, and Charles Dickens used scrunched in “Sketches by Boz,” in 1836: “He had compromised with the parents of three scrunched children, and just ‘worked out’ his fine, for knocking down an old lady.” Now, these words are all found in the OED.
I think slenthem is an excellent example of a word that’s unquestionably necessary (pronounced /’slʌntəm/ [SLUHN-tuhm], it’s the name for an instrument in a gamelan orchestra, and discussions of Javanese music are full of statements like “the slenthem plays the demung part delayed by a quarter of a balungan beat”), and if gamelan music were as popular in English-speaking countries as jazz, it would be in dictionaries just like saxophone (and would lose the itals), but as things are, it’s such a specialized word that it’s unlikely to find a place in any but the OED (which will probably add it when they get around to revising S). I was initially startled by the “th,” but Wikipedia explained that “Javanese, together with Madurese, are the only languages of Western Indonesia to possess a distinction between retroflex and dental phonemes…. These [retroflex] letters are transcribed as ‘th’ and ‘dh’ in the modern Roman script.” So now I know.