Mark Liberman at Language Log discusses a question that had never occurred to me but that I now desperately want an answer to: why, around two centuries ago, did people start writing American Indian names and words with hyphens between the syllables? They hadn’t done so earlier; John Eliot’s Massachusett Bible of 1663 (Book of Ruth) writes the words as single units (“Kah ìwesuonk noh wosketomp Elimelek, & ìwesuonk ummittamwussoh Naomi…””), and Mark provides images from Benjamin Smith Barton’s New views of the origin of the tribes and nations of America (1797), which does the same. But Edwin James’s Account of An Expedition From Pittsburgh to the Rocky Mountains, Performed in the Years 1819, 1820 … , published in 1823, occasionally hyphenates: “whether it would be proper to ascend Running-Water creek, (Ne-bra-ra, or Spreading water), or the Platte, (Ne-bres-kuh, or Flat water), or hunt the bison between the sources of those two streams…” Soon it becomes extremely common; for instance, from A Narrative of the Captivity and Adventures of John Tanner (1830): “The Indians who seized me were an old man and a young one; these were, as I learned subsequently, Manito-o-gheezhik, and his son Kish-kau-ko.”
There are various suggestions at Mark’s post, but here I just want to highlight one very funny anecdote:

One case arose on the Pawnee Reservation, Oklahoma, where an indian was named Coo-rux-rah-ruk-koo. Commonly he was known as Afraid-of-a-bear. A literal translation of his Indian name was “fearing a bear that is wild.” From this translation the agent recorded him as Fearing B. Wilde.


  1. Christine in Baltimore says:

  2. Henry Schoolcraft, whom Mark mentions, explicitly advocated it in his boilerplate. He relates this to long and short vowels in a strange way and doesn’t seem to follow the rule himself.
    This does nothing for antedating, but it was during the period where it seems to have most been in vogue.

  3. John Wesley Powell (or the Bureau of Ethnology under his direction), too, at least for fieldwork.

  4. SnowLeopard says:

    This may be idle speculation, but could the hyphens reflect an effort to analyze underlying word structure? All or nearly all of the hyphenated words you quote seem to be compounds, and hyphenation is used in Holmes and Smith’s “Beginning Cherokee” (1976) with this same instructive purpose. They give examples of Cherokee compounds such as “horse: so-qui-li; burden-bearer (he-carries-heavy-things)” and “attorney: di-ti-yo-hi-hi; he argues repeatedly and on purpose with a purpose”. Durbin Feeling’s Cherokee-English Dictionary (1975) does not use hyphenation in text examples, but in the word headings does follow each syllable with a superscript numeral indicating tone, with virtually the same effect. On the other hand, I don’t think I’ve ever seen hyphens used to syllabify languages like Navajo or Yup’ik, which inflect with gusto but don’t seem as enthusiastic about forming compounds as Cherokee is and the other Iroquoian languages encountered by early writers may be. Navajo and Yup’ik also apply complex phonetic rules that mean that a hyphenated analysis wouldn’t be as tidy as it seems to be in some of these other languages. Just a wild guess.

  5. Similar tendencies wax and wane in early Australian language recording. Some examples are preserved in placenames, e.g. the Ku-ring-gai Chase National Park (and council). Sometimes you can infer what means in wordlists by the presence of a hyphen (e.g. ku-rong is likely to be /kuruŋ/, but kurong is more likely to tbe /karuŋ/.

  6. This was done with Chinese to reflect the writing system. Conceivably Americanists were influenced by Sinologists.

  7. David Marjanović says:

    I used to think it was because English isn’t graphemic and because it’s easier to guess the pronunciation of English syllables than of English words, where e. g. a following might influence the preceding vowel or belong to the next syllable. But some of the examples above don’t fit that.

  8. Some offline sources seem to indicate that it might have been Whitney who guided Powell.
    John Pickering doesn’t seem to mention hyphenation in his Uniform Orthography. But on the other hand, he does it himself in a footnote on the word Cherokee. His high-level inspiration was Sir William Jones, whose Disseration on the Orthography of Asiatick Words in Roman Letters only uses them for visarga that I can tell.

  9. Lewis and Clark did it sometimes, though I can’t figure out any principle. This in their journals, not just as later published.

  10. Could it have had to do with the languages that were being encountered and studied in the 19th century as opposed to earlier? Could the languages farther West have just lended themselves to hyphenation?

  11. It may just be something in the 19th century air…..
    I’ve seen genealogical journals/books from that era (well, the 1850s and 1860s) that hyphenate a woman’s first name and her maiden name, with no hyphen or other punctuation between the maiden name and married name. So those books/journals have Mary Smith who married John Doe listed as Mary-Smith Jones.

  12. marie-lucie says:

    If you are asking people for words in their language, you probably ask them to repeat them slowly. If the words have 3 or 4 syllables, or even more, the tendency of someone repeating words is to break them down into syllables. This is also done to indicate pronunciation of unfamiliar words in some textbooks directed at children, eg for instance “Nicaragua (Nee-kuh-rah-gwuh)” (I am not quoting, but this is the type of thing I mean).
    In general, people are often not aware of morpheme segmentation in their own language (even when it is possible), but only of syllables. This is one reason why some products or businesses get names which are a more or less phonetic rendering of the syllables of an ordinary word or phrase, as in E-Zee for ‘easy’.
    Hyphenation as used by Powell and others is an attempt to show pronunciation, not morphemes. For instance, in the Cherokee example quoted above: horse: so-qui-li; burden-bearer (he-carries-heavy-things), a word of 3 syllables is translated alternately as 2 words or 4 words in English. Obviously the Cherokee syllables are not each equivalent to an English word, so that the segmentation does not illuminate Cherokee word structure. On the other hand, using hyphenation in the translations aims to show that they correspond to single words in the native language.
    But this also leads some people to believe that non-European languages have this sort of structure, just syllables strung together. For instance, in British Columbia (Canada) there is a “tribe” who call themselves Nuh-Chah-Nulth. I have no idea whether this corresponds to a proper segmentation of their name, or just repeats a 19th century transcription syllabifying the word, since a modern transcription would be based on different principles.
    Sometimes this sort of transcription is used for humorous effect by the people themselves. I recall seeing an old picture of some BC native people in a small boat on a river: the name of the boat was Slo-Mo-Shun.

  13. Most English speakers mangle long words in unfamiliar languages, not only getting sounds wrong but also deleting and transposing them. Besides the vagaries of English spelling, part of the problem is the monotony of long words in a linear script. Educated native speakers of a language have learned to deal with long words and may even find them more compact to write (note that Eliot’s Bible was meant for native speakers) but foreign casual users haven’t (most of the later examples quoted above are trying to enhance the chances of English-language readers achieving a reasonable approximation). Personally, I have a much harder time reading long words than short words, in the non-Roman alphabets I have a little knowledge of.
    The feeling that long words are educated and short or segmented words are kiddie stuff that is degrading for adults seems to be ingrained in educated English-speakers. I’ve seen some Westerners earnestly urge the use of Vietnam or Hongkong as somehow more dignified than Viet Nam or Hong Kong, spellings they have assumed must surely reflect Western racism viewing Asians as childish; and others hail conversion of Chinese to Roman script with demarcation of words not syllables as the key to real literacy, education and democracy for the Chinese masses. Meanwhile the Vietnamese stubbornly continue to write syllables separately even though they use a Roman alphabet and even for multisyllable Western loanwords, and many Chinese quoting Chinese words in English (presumably techies with some exposure to programming and less exposure to Western literary prejudices) intuitively use CamelCase to show the Chinese syllable structure, to the horror of Western readers who find this barbaric.
    But speaking of kiddie stuff, has segmentation with hyphens or something else been used in teaching English-speaking children to read English, and might this have become common in the early 1800s, leaving this and later generations familiar with this technique for explaining unfamiliar long words?

  14. michael farris says:

    I’m not sure if this entirely relevant, but …
    In some Amerindian languages with a written tradition there’s a tendency to try to shorten written words (that otherwise might be too long according to the CW) by writting affixes as a separate words. In Creek the patient and locative prefixes were often written as separate words (ce hecvyvnks instead of cehecvyvnks ‘I saw you’) and Choctaw any suffix beginning with a consonant has often been written as a separate word (chi pisa li tok instead of chipisaliktok with the same meaning).
    I think the idea is rooted in reading theories where shorter written words are easier to process than longer ones (especially for people expected to do most of their reading in English).

  15. marie-lucie says:

    Even when people are used to literacy in a dominant language such as English, if they are learning to write their own language, (which they have hardly ever seen written, and in which they have never had any instruction such as how to identify words and their components), they go slow, and pronounce words carefully before writing them. In so doing they tend to pronounce distinct syllables, and therefore tend to write the syllables separately.
    Writing systems set up by linguists or missionaries – persons coming from literate backgrounds and trained to analyze words – identify whole words, for the benefit of the speakers (as for instance the early translators into Algonquian languages), but unless speakers are well trained in this way of writing, and have access to samples such as printed texts in their languages, they also tend to separate syllables, or at least some of them. (Incidentally, the Cherokee who developed a writing system for his language made up a syllabary, not an alphabet).
    Persons like Major Powell, on the other hand, were not trying to teach speakers to write their own languages, but instead they were attempting to record the pronunciation of native words for the benefit of readers who spoke European languages (usually English, but sometimes German), and it was often easier to do this with syllables joined by hyphens rather than with words written whole (as David M pointed out), even though the various recorders who participated in the project were unevenly trained and might not always have followed this rule faithfully.

  16. David Marjanović says:

    (Incidentally, the Cherokee who developed a writing system for his language made up a syllabary, not an alphabet).

    Yes, but the reason for this seems to be that a syllabary is easier to invent from scratch than an alphabet. Legend also has it that Sequoyah* first tried to invent a logographic script and found that too much work.
    Another factor to consider might be how easy syllables can be recognized in a language. Many of the languages mentioned above are more or less limited to CV and CVC (…though Nuu-chah-nulth, aka nuučaan̓uł or T’aat’aaqsapa, most certainly isn’t…), while for example in German syllable boundaries routinely run through morphologically single consonants or are delocalized throughout a consonant cluster. In the orthography of Serbocr… BCSM, you are even allowed to choose (in both alphabets) whether to separate ze-mlja or zem-lja (where lj is considered a single letter).
    * What exactly is that h doing there? Is it just for telling English speakers that the a is meant seriously?

  17. marie-lucie says:

    * What exactly is that h doing there? Is it just for telling English speakers that the a is meant seriously?
    It must be. The ah and the lth, as well as the syllabification, are typical 19th century spellings, most likely from a missionary tradition. In the present world, they have the advantage of not requiring any diacritics or special characters (a problem when names are quoted in newspapers, for instance), and the further advantage of a link with a tradition that is still within living memory.
    In many endangered languages, spellings developed by missionaries 100 or more years ago may not be entirely adequate, but they are recognizable to the older generations of speakers, while newer, more accurate spellings developed by linguists are considered unreadable by those speakers (like English speakers trying to decipher phonetic transcriptions of English). This widens the already considerable gap between the few surviving fluent speakers and their less than fluent grandchildren learning the new spelling in school: the elders cannot help the children pronounce words that they themselves cannot read.
    About Cherokee, yes, indeed a syllabary is easier than an alphabet, for those languages which have simple syllables, which is the case for Cherokee (alias Tsalagi), as it is for Japanese, but the examples given earlier, written alphabetically, also separate the syllables, a natural tendency in attempts to write such languages even when an alphabet is available.

  18. Eliot’s Indian Primer (1669) proceeds as follows:

    1. Alphabet.
    2. Possible syllables.
    3. Brief passages with hyphens.
    4. Lord’s Prayer.
    5. Longer passages.

    So, that third section seems to be following the convention from the earliest time; later ones dispense with the hyphens. It begins:

    Wa-an-tam-we . uſ-ſeonk. ogke-tam un-at . Ca-te-chi-ſa-onk.
    Ne-gon-ne . og-kee-taſh. Primer.
    Na-hoh-to-eu . og-kee-taſh.
    Ai-uſ-koi-an-tam-o-e . weh-kom-a-onk.
    Ne-it . og-kee-taſh . Bible.

    Which I believe is recommending a graduated reading program from among his many publications. If you’ve got access to Early American Imprints : Evans, and many American libraries do (the BPL has a proxy), it’s page 8 of that scan.
    It seems that it is indeed, as marie-lucie hypothesized, a scheme for pedagogy and narrow transcription that leaked over into normal writing for a while.
    Speaking of that and of Sequoyah, here’s Se-quo-yah, the American Cadmus and Modern Moses.
    Although it uses spaces and not hyphens to break up syllables, the dialogs in this book are so far gone that you’d almost think it’s a parody, but I fear it isn’t.

