BURUSHASKI.

There’s something romantic about language isolates. The most famous is Basque (subject of much crackpottery); others are Ainu and the Siberian languages Ket and Nivkh (also known as Gilyak). In and around the Hunza Valley of northern Pakistan, almost 90,000 people speak a language called Burushaski; I’ve known about it for over 30 years, ever since I read W.B. Lockwood’s A Panorama of Indo-European Languages in grad school and found a paragraph on it full of wonderfully exotic names:

In the western part of the Karakorum an isolated language, Burushaski, survives in two enclaves: an eastern form found in Hunza and Nagar, a western form in Yasin, where it is termed Werchikwar. To the north there is contact with Wakhi, in Yasin also with Khowar, otherwise with Shina, a language which has advanced in the Gilgit area at the expense of Burushaski… Dumaki forms a diminutive Indo-European enclave within the Burushaski of Hunza and Nagar. To all intents and purposes, Burushaski is a purely oral medium.

Well, in a comment to this post, David Marjanović linked to an online version of a book containing a compact grammatical description of the language, Dick Grune‘s Burushaski − An Extraordinary Language in the Karakoram Mountains (pdf, HTML cache), whose very first words told me I’d been pronouncing the name wrong all these years: “Burúshaski (stress on the second syllable)…” The book is very clearly and enjoyably written, an unusual pleasure in this kind of text (“The bad news is that Burushaski has perhaps as many paradigms as Latin, but the good news is that they are much more regular”). Grune discusses the possible relationships of the language:

Although Burushaski has been compared to almost any language on earth, no fully convincing relationships have yet been established. Modern taxonomic methods are, however, beginning to yield results. Ruhlen (1989) [lit.ref. 7] still classified Burushaski as a language isolate: ‘its genetic affiliation remains a complete mystery’ (p. 126), but Ruhlen (1992) [lit.ref. 7] reports on a possible classification of Burushaski as a separate branch of a newly proposed Dené-Caucasian superstock. More recently, Blažek and Bengtson (1995) [lit.ref. 8] list tens of etymologies relating Burushaski to the Yeniseian languages, spoken by a hundred people along the Yenisei river in Siberia. Where appropriate, we have included these etymolgies in this survey.

(I’m not sure what the “lit.ref.” numbers refer to; the list of references at the end is not numbered and has only one entry for Ruhlen, his Guide to the World’s Languages: Volume 1 [1987; 1991].) He begins his description of the language with this summary:

For all its romantic and exotic associations, Burushaski is not much weirder than Latin, Turkish or Finnish; of these three it is most reminiscent of Turkish in its structure. It has two or three cases for the nouns (see below) and a small number of locative suffixes; it has essentially one conjugation for the verb, plus a number of composite conjugations; and its sentence structure is similar to that of Turkish but much simpler. Its most remarkable features are that it has four genders for the nouns and that the indications of the object of the verb are the same as those for possession on the noun: ‘I hit him’ is expressed roughly as ‘I do his hitting’, as in many Amerind languages.

The four genders (I know you’re wondering) are human males (which he abbreviates hm), human females (hf), animals and countable objects (x), and materials and abstracta (y). Another interesting feature is the consecutive, “which has no counterpart in English. It has the meaning of ‘after having done so and so’ or ‘when such and such state had arisen’; it is a kind of adverbial past participle and it is used very, very frequently in Burushaski.” The higher numbers are vigesimal: 20 is áltar, 30 áltar tórum ‘twenty ten,’ 40 altó-áltar ‘two-twenty,’ 50 altó-áltar tórum, and so on. I greatly enjoy this kind of compendious description; it gives me the sense of getting a handle on a language without having to do any real work. Thanks, David!

Comments

  1. This is very interesting. I haven’t read much about this language, but from your description the “consecutive” reminds me of an aspect/tense I built into a conlang (constructed language)I created for a race of spirits. I think I’ll do a little research on Burushaski and see exactly what the consecutive does.

  2. John Emerson says:

    I’ve been working on bringing up Burushaski crackpottery to the Basque level. But I can always use help (hint, hint)!

  3. The lit.ref. numbers do look to be in the order given, so perhaps it’s just the two editions of Ruhlen that are being contrasted, with some confusion about dates.

    The author [Lorimer] deliberately makes no attempt to distinguish between the results of the inherent inaccuracy of normal speech and those of grammatical processes and just recorded what he heard.

    Evidently it was actually Mrs. Lorimer who did the grunt field work.

  4. What leapt at me was the description of Partawi Shah (Dr. Hunzai) as “the 1st poet of Burushaski language.” From what I can tell, the claim is that he was the first person to write poetry in Burushaski. Fortunately, Burushaski does not seem to be a language devoid of poetry, because some more scrounging turned up this page of flute music from Hunza Valley where a person who goes by the handle ‘shugulo’ says:

    An extraordinary effort to promote the traditional music of North. Need to expand the site by adding the reconstructed old songs with their background. Mr. Shahid Akhtar Qalandari is doing such an effort to collect and reconstruct the songs,which were sung,written or translated by miscellaneous various personnel’s. An example is of a famous song written by Raja Mehboob Ali Khan of Nagar (former ruler of State of Yasin).The stanzas of this song have become extinct. One needs to recollect from any source and the best sources may be to contact the decendents of Raja Mehboob Ali Khan. The poetry in his song has laid the foundations of burushasky poetry in Burushaski Litrature. A song written by Mahabat Khan (younger brother of Grandfather of Shahid Akhtar Qalandari) has another very good example of Burushaski poetry. There is a need to have a survey to collect such songs written by unknown people and are becoming extinct. This is the duty of youth to collect such songs from elders otherwise we will loose a golden treasure of burusho literature.

    As I am at work I can’t check out the music, but it promises to be very interesting.
    A page on the Overseas Pakistanis Foundation site has this to say on the subject of Burushaski literature:

    The written literature in the language is scanty and scattered. A few poems and stories have been written in Burushaski by some obscure authors from Hunza, using a modified Perso-Arabic script.

    Finally, here’s an interesting tidbit google turned up in the Chinese wikipedia entry on Burushaski:

    Tibetan sources also record a Bru-sá language of the Gilgit valley, which appears to have been Burushaski. The Bru-sá are credited with bringing the Bön religion to Tibet and Central Asia, and their script is supposed to have been the ancestor of the Tibetan alphabet. However, no Bru-sá manuscripts are known.

    I had never heard of the Bön religion before. Suddenly I feel like a character in a story by Jorge Luis Borges.

  5. Why is the Chinese wikipedia entry in English??

  6. The Chinese version is plainly a clone of the English version, or rather an earlier state of it.
    I too always said Burusháski, which is yet more evidence that the default stress in English is on the penultimate (the large number of inherited Germanic words conceal this).

  7. marie-lucie says:

    By all means, more knowledge about Burùshaski and the other languages mentioned would be all to the good.
    About larger classifications: Ruhlen is not reliable, as he tends to accept any new proposal (see a comment to that effect by Bill Poser some time ago). Bengtson is not reliable either. Although there seems to be mounting evidence for a Dene-Yenisei relationship, Dene-Caucasian is another matter. Finally, lately the meaning of the word “etymology” seems to have changed from “items on the accepted family tree of a particular word” (as in Eng head, Old English heafod, [related to] German Haupt, Latin caput) to “group of words presumed to be related”, often on the basis of superficial resemblances – not the same thing at all. This second meaning is the one in the quote in LH’s message.
    An example of Ruhlen’s unreliability:
    Look up Zuni (should have a tilde on the n) and you might read that it has sometimes been linked to the “Penutian superstock” of Western North America, although the latter group (proposed by Edward Sapir) is not generally accepted, because of the diversity of the 15-odd language families (not just single languages) included in it. Ruhlen has written somewhere that “Zuni is obviously Penutian” (on what basis, he does not say), and that only old diehards stuck in the mud (or words to that effect) are unwilling to accept the connection. Actually, the connection is based on a single article published in 1965 in the International Journal of American Linguistics (IJAL, a journal dedicated to Native languages) by the well-known American linguist (now deceased) Stanley Newman, but there was for a long time a persistent rumor among Americanists that it might have been a hoax. It turns out that this was indeed the case, and the evidence (based on interviews with several linguists who were in on it, some of whom I know personally) was published in 2002, unfortunately in an article about Southwestern native agriculture that few persons interested in language are likely to read. I know about it because I know the author personally, the very respected linguist and anthropologist Jane Hill, who showed me a copy of her article.
    The reason behind the hoax was that many linguists thought that the editor of the IJAL at that time, Carl Voegelin, was very lax about the standard of the articles he accepted, and the Zuni paper was meant as a test of whether or not he would swallow the bait. He did, and the article, which contains only rather unsystematic resemblances of vocabulary, was published. Apparently Newman was embarrassed about it, but did not want to embarrass Voegelin in turn by revealing the true state of affairs.
    As for any language being “obviously Penutian” as Ruhlen says, I myself have been doing research in this group for a number of years, and the overall resemblances between the 15 or so language families included by Sapir (as opposed to the languages within the families) are far from obvious (and, in case you think that Newman could have been right anyway, Zuni definitely does NOT fit the group). More recently, Greenberg included an even larger number of languages and families, including Zuni, under the umbrella term “Penutian”, a part of his so-called “Amerind”, largely on the basis of yet other proposals which have never been accepted. (Greenberg’s method, upheld by Ruhlen: accept the proposal, then look for words which might fit it – rather than the “old-fashioned” method: look for resemblances in structure as well as vocabulary, determine their systematicity, and propose groupings accordingly).

  8. Wow, what a fascinating story about the IJAL hoax!

  9. I can honestly (and carefully) say that I have never pronounced “Burushaski” incorrectly. Not even once.

  10. David Marjanović says:

    (“The bad news is that Burushaski has perhaps as many paradigms as Latin, but the good news is that they are much more regular”

    Well. The chaotic noun plurals remind me of German — you can guess the right one in maybe 40 % of the cases.
    The consecutive is cool, though…

    Ruhlen is not reliable […] Bengtson is not reliable either.

    Now, I agree that Ruhlen’s method — a phenetic method that counts similarities instead of shared innovations — is fair enough for generating hypotheses, but bad at testing them. Also, if you write books about all languages in the world, mistakes are bound to creep in. But Bengtson? Please explain.
    On German Haupt, please note that its meaning “head” is nowadays entirely poetic and is completely unknown in at least my dialect. Most of the time it’s not a word, but a prefix that means “main”.

    I can honestly (and carefully) say that I have never pronounced “Burushaski” incorrectly. Not even once.

    So you have never pronounced it?
    I stressed it on the 3rd syllable before I found the German Wikipedia article on it. Maybe the default stress in non-compounds in German is on the penultimate, too… I’m not sure if I ever said the word aloud, though. :o)
    BTW, the paragraph from the Chinese Wikipedia article is still in the English one.

  11. I had a tiny epiphany when I realized that “hauptman” meant “captain” and that “hetman” was a derivative.

  12. Well, the OED says “Believed to be derived from Ger. hauptmann,” but as beliefs go, that’s a reasonably reasonable one.

  13. Curses! What’s the HTML entity for a quick, smoke bomb-assisted getaway?

  14. Re “hetman”: not all -man are IE men. Wiki: “Dragoman designates the official title of a person who would function as an interpreter, translator and official guide between Turkish, Arabic, and Persian-speaking countries and polities of the Middle East and European embassies, consulates, vice-consulates and trading posts.” That word has roots extending to Akkadian “targumannu”.
    I wouldn’t be surprised if I found that “hetman” is a loan from a Turkish word which looks like a calque on an Arabic word.

  15. “Turkomen” is also pure Turkish, but “men” is a kind of collective suffix.
    I thought for awhile that “hetman” might be of Turkish derivation, from the Tatars to the Cossacks, but at this point it doesn’t seem so.

  16. According to the dictionaries, ‘talisman’ is also unrelated to ‘man’.

  17. Interesting — the OED says (or said, a century ago) only that talisman “appears to be a corrupt or mistaken form of some Arabic, Persian, or Turkish spoken word, imperfectly caught by early travellers,” but Merriam-Webster confidently traces it back to “Arabic tilsam, from Middle Greek telesma, from Greek, consecration, from telein to initiate into the mysteries, complete, from telos end.”

  18. Tim May says:

    Hmm, Hat, the online OED (I happened to have the window open) has two entries for “talisman”, and the one in the relevant sense (talisman²) says:

    = 17th c. F., Sp., Pg. talisman, It. talismano, ultimately representing Arab. ṭilsam, in same sense, ad. Gr. τέλεσμα [see TELESM]. The final -an is not accounted for.

    An Arabic pl. ṭilsamān, alleged by Diez s.v., and thence in various recent dictionaries, is an error: no such form exists in Arabic, Persian, or Turkish. The only Arabic form at all similar would be a relative adj. *ṭilsimānī (one) dealing with talismans, if this were in use. The identity of talisman with τέλεσμα was first pointed out by Salmasius, Hist. Augusta 1620.

    The bit with “appears to be a corrupt or mistaken form” &c. is in talisman¹, “A name formerly applied to a Turk learned in divinity and law, a Mullah; sometimes to a lower priest of Islam, a religious minister, a muezzin.”

  19. David Marjanović says:

    I wouldn’t be surprised if I found that “hetman” is a loan from a Turkish word which looks like a calque on an Arabic word.

    Ataman looks pretty Turkic, doesn’t it?

  20. Tibetan sources also record a Bru-sá language of the Gilgit valley, which appears to have been Burushaski. The Bru-sá are credited with bringing the Bön religion to Tibet and Central Asia, and their script is supposed to have been the ancestor of the Tibetan alphabet. However, no Bru-sá manuscripts are known.

    Coincidentally I have spent the last three weeks researching the Bru-sha and related Zhang Zhung scripts found in Bön texts. I’ll be blogging on them in a couple of days, but if you want a sneak preview of what the putative Bru-sha script looks like its letters are shown here.
    The script is quite similar to Lantsa, and it is highly unlikely that it is anywhere near as ancient as is claimed or that it derives from Bru-sha (whether or not the identification with Burushaski is accepted). The claim that it is ancestral to the Tibetan script (specifically the cursive dbu-med style favoured by Bönpos) is, in my opinion, totally without foundation.

  21. Great — please leave a link to your post when it appears!

  22. I am also speaker Burushaki Language . i am happy to read this article it will be very useful for thesis.

  23. David Marjanović says:

    Belatedly:

    I too always said Burusháski, which is yet more evidence that the default stress in English is on the penultimate (the large number of inherited Germanic words conceal this).

    Meanwhile I think what’s going on is that (in German, English and likely others) any word of four or more syllables is presumed to be a compound, a compound that has four syllables is likely to consist of two parts with two syllables each, and each part is (still) stressed on its first syllable by default, which means there’s at least some stress on the third syllable of the compound.

    Why there isn’t even more stress on the first is less clear to me, but German placenames with a hyphen in them are stressed on the last element.

  24. There is a rhythmic element to it, but that can’t be the whole story in English. Consider infinitesimal, which has antepenultimate stress. The first syllable is unreduced, which gives it a sense of secondary stress which matches the antepenultimate stress on infinite. But the same is true of antepenultimate itself, where the first three syllables are neither a word nor even two morphemes (penultimate is monomorphemic in English, just as peninsula is).

  25. David Marjanović says:

    English productively uses the Latin rule of stress on the antepenultimate on Classical loans – after putting them in English form; some even apply it to nomenclature.

    German instead usually stresses the syllable that is stressed in the original language (when there is one and the word wasn’t composed from Classical roots later), which is often now the last. No doubt this was helped along by French – whereas French, strangely, has had no effect on English in this respect. Natur, Universität, stabil, Physik… are all stressed on the last syllable. (Instabil is stressed on the first, but I think that’s contrastive stress which has gone lexical in this instance.)

  26. Raja Mehboob Ali Khan of Nagar is the first poet of Burushaski language. He has composed a book (Diwan) in Burushaski Poetry. His family members may have this book.

  27. (Instabil is stressed on the first, but I think that’s contrastive stress which has gone lexical in this instance.)

    Maybe it’s different in Austria, but normally all words prefixed with negative in- are stressed on the first syllable, perhaps indeed due to contrastive stress or perhaps due to analogy with the native prefix un-.

  28. David Marjanović says:

    …You’re right, in- is always stressed.

  29. On Nivkh – this article (PDF) claims that it’s not an isolate, but related to Algic, which would make it the third pre-Columbian language family relationship between the Old and the New world (besides Eskimo-Aleut and Dene-Yenisean). Maybe someone who knows more on the language families involved can judge how good the reconstruction and offered cognates are.

  30. Algic is apparently what we now say instead of Algonquian, for those as out of it as I am.

  31. Looks weak, I’m sorry to say, from brief inspection. No grammar at all (unlike Dené-Yeniseian, which started off with comparing verbal morphology.) Spot checking the proposed etyma finds a lot of weak etymologies of the kind Campbell wrote so much about: one-syllable matches, procrustean semantics, items of wider distribution. Quite a few segments are not consistent with the given table of sound correspondences.

  32. ‘Algic’ is not a substitute for ‘Algonquian’, it’s just a larger family which also includes the two Californian sister languages of Algonquian.

  33. Thanks, I was shooting from the hip as usual.

  34. David Marjanović says:

    I’ll need to read the whole thing; I’m surprised by the conclusion (in the abstract) that there’s no particular relationship between Nivkh and Chukchi-Kamchatkan, because a good argument for their being sister-groups was presented in this paywalled paper.

  35. David Marjanović says:

    …The paper I linked to is actually cited. Will be interesting to read what Nikolaev has to say about that.

    Best passage so far (2nd page):

    Sapir’s “Algonkin-Wakashan” (or “Almosan”) remains a speculative hypothesis, not to mention Joseph Greenberg’s “Almosan–Keresiouan”^2. While Mosan is considered as a probable (although not properly demonstrated) diachronic unit with features typical of a Sprachbund (Beck 1997), both “Almosan” and “Almosan–Keresiouan” have been rejected by most specialists in Native American languages (Campbell 2000: 327–328). Nevertheless, the reasoning of the “non-believers” is no more or less convincing as that of the “believers”, since both positions remain equally unfounded.

    Footnote 2: “Greenberg (1987) included Sapir’s Algonkin–Wakashan (denoted as ‘Almosan’) into the ‘Almosan–Keresiouan’ phylum along with the Caddoan, Iroquoian, Keresan, and Siouan–Catawban families. This hypothesis presumes an exclusive distant relationship and has not been properly supported with standard methods of comparative linguistics.”

  36. marie-lucie says:

    Greenberg admitted that he arrived at his supergroups mostly by putting together language groups proposed by others, whether generally accepted or simply suggested, together with a few additional groupings of his own. His method was “mass comparison” of lexical items, something fraught with many pitfalls (elsewhere itemized by Campbell as “the methods”) in the absence of serious morphological analysis (which many people forget was basic to the beginning of serious historical linguistics). Although some of Greenberg’s criticisms of the practices of some scholars of Native American languages are valid, his own contributions to this particular area of scholarship cannot be taken seriously.

  37. marie-lucie says:

    Nivkh and Chukchi-Kamchatkan

    I have not read the paper, but the author, Michael Fortescue, has at least worked on Chukchi-Kamchatkan for years.

  38. Fortescue’s paper is available here.

  39. David Marjanović says:

    From September 2:

    Looks weak, I’m sorry to say, from brief inspection. No grammar at all (unlike Dené-Yeniseian, which started off with comparing verbal morphology.) Spot checking the proposed etyma finds a lot of weak etymologies of the kind Campbell wrote so much about: one-syllable matches, procrustean semantics, items of wider distribution. Quite a few segments are not consistent with the given table of sound correspondences.

    I’ve now read the paper and skimmed Fortescue (2011) again. The latter paper is actually weaker than I remembered; the vocabulary is not very basic, it and the grammar often needs to invoke special phonetic developments (irregular contractions or assimilations), and with grammatical elements one-syllable matches are often inevitable.

    Nikolayev (2015) has some of the same issues. It’s stated to be the first in a series of papers, and the table of sound correspondences is explicitly stated to be simplified, with positional developments saved for a later paper, which unfortunately means that the sound correspondences in the compared vocabulary are a bit hard to judge. One-syllable matches are plentiful, because Nikolayev tried to fill in all of Starostin’s 110-item word list (Swadesh’s 100-item word list plus 10 that Starostin thought were also useful). Even so, however, this means that Nikolayev compares a lot of basic vocabulary. The assumed changes in semantics seem rather harmless to me on the whole.

    Nikolayev explicitly takes Nivkh out of Nostratic, where it never fit quite comfortably. The personal pronouns make more sense now. Instead of beginning with *m-, “I” in Nivkh is *nʲi; a sound change [mi] > [mʲi] > [mnʲi] > [nʲi] isn’t impossible or unknown (as Fortescue explained at some length), but the corresponding forms in Proto-Salishan, Proto-Wakashan and Proto-Algic all have *n- (there’s *l- in Chimakuan); that’s an easier correspondence. “You (sg.)” is *cʰi in Nivkh; it isn’t hard in principle to derive that from *t-, but somewhat easier (and, says Nikolayev, regular) correspondents are found in Proto-Chimakuan *ki-, Proto-Algic *ke- and, more distantly, Proto-Salishan *kʷə-. Proto-Wakashan, though, is different, showing unexpected *suː- (with the same vowel as in 1sg and 1pl). For “we”, Nivkh has both *me- and *nʲɨ-. Of course *m- is all over Nostratic, but interestingly it’s also in Proto-Chimakuan, and there’s a *-m- in Proto-Salishan *n-ʔim-. The only Nostratic *n- form appears to be PIE *ne-¹; on the other side, *n- is found in Proto-Wakashan and Proto-Algic (with the same vowels as in the singular, though).

    Comparative morphology is relegated to footnote 10 – which is quite interesting:

    It makes little sense to discuss morphological similarities between languages that are so remotely related, but it may be noted that Proto-Wakashan, Proto-Nivkh and Proto-Algic are reconstructed as polysynthetic languages with weak prefixation and well-developed suffixation, including incorporation of nominal and verbal roots as “lexical suffixes”. In this respect Nivkh may be considered as the most archaic constituent, since, although the “incorporated” nominal and verbal forms in Nivkh are marked with morphophonemic sound alternations, they have not been transformed into proper suffixal forms, the way it happened in Proto-Chimakuan-Wakashan and in Proto-Algic. A peculiar feature of these languages is suppletion in the sphere of body part terms and in some other lexemes, when independent and suffixal forms are derived from different roots (a serious problem for lexicostatistical work on those of the languages that are poorly documented). Polysynthesis is also well developed in Na-Dene, Chukchi-Kamchatkan, and Eskimo-Aleut languages, i. e. it can be considered a Sprachbund-level phenomenon. Formal borders between noun and verbal stems are rather arbitrary. Several “non-trivial” PAW affixes may be reconstructed, such as *ŋV-, attached to inalienable nouns, or the plural infix *-Ay-. Several other common monosyllabic nominal and verbal suffixes have also been noted, but they are generally irrelevant for the demonstration of remote relationship, since similar auxiliary morphemes with the appropriate grammatical meanings may be found in the majority of the world’s language families.

    Beware the one-syllable matches, then; but having such things as inalienable nouns or plural infixes is in itself unusual enough on a global scale to count for something, and suppletion is always good.

    ¹ Dravidian isn’t in Table 6. Probably, Nikolayev follows the rest of the Moscow School in now restricting the term “Nostratic” to its supposed northern branch (similar but not identical to Greenberg’s Euroasiatic), excluding Dravidian and Afroasiatic which are thought to be more distantly related. Unnecessary confusion like the two meanings of “Indo-European”, I would say.

  40. David Marjanović says:

    BTW, does anybody know why Nikolayev reconstructs the oblique stem of PIE “I” as h₁me-? Is there any evidence for the laryngeal? It can’t be from supposed external evidence, because it corresponds to nothing in the other cited Nostratic forms.

    Unfortunately, I understand quite well why Nikolayev cites PIE “tongue” as *dlengʲʰw-, and I’m not happy about it. PIE “night” as *nok(t)- isn’t good either. Fortunately neither of them matters for the arguments in the paper…

    the vocabulary is not very basic, it and the grammar often needs to invoke special phonetic developments

    I mean both the vocabulary and the grammar often need to…

  41. BTW, does anybody know why Nikolayev reconstructs the oblique stem of PIE “I” as h₁me-?
    Probably because of Greek ἐμοῦ, ἐμοί, ἐμέ; Armenian forms with Initial im-; Hittite oblique amm-; one can find that reconstruction with initial laryngeal in a lot of the modern IEanist literature, especially from the Leiden school.

  42. David Marjanović says:

    Ah. That was a genuine gap in my knowledge, then.

  43. Anybody who likes the idea that IE is an M/T (“Mitian”) language may analyse the extra *h₁(e)- as a deictic element (“me here”), perhaps identical with the verbal augment. In the languages that use the augment, imperfects and aorists without it serve as “injunctives” (expressing pure aspect, unmarked for tense). Note that the so-called “primary” endings (active -mi, -si, -ti and their middle counterparts) already contain a tense marker (*-i or -r), and so they don’t take an augment. My tentative guess is that the augment was an adverbial pronoun meaning, more or less, ‘here, then, at this point in time’ (in telling a story).

  44. David Marjanović says:

    Anybody who likes the idea that IE is an M/T (“Mitian”) language may analyse the extra *h₁(e)- as a deictic element (“me here”), perhaps identical with the verbal augment.

    Or, of course, everyone but IE could have simplified the consonant cluster; no comparable clusters and no glottal stops or similar phonemes are currently reconstructed for the other M/T protolanguages.

    But I knew next to nothing about the augment, so thanks 🙂

  45. David Marjanović says:

    …I just remembered Greenberg’s interpretation of the PIE “I” word, which he reconstructed as some kind of *eghom, as *e-gho-m “this-is-me”.

    (I’m aware that current reconstructions and interpretations, including those by Moscow School Nostraticists, are different.)

  46. marie-lucie says:

    David: Greenberg’s interpretation of the PIE “I” word, which he reconstructed as some kind of *eghom, as *e-gho-m “this-is-me”.

    I wonder what qualified Greenberg to reconstruct anything in PIE (or to analyze an accepted reconstruction – I think I have seen *eghom before).

  47. The nominative of the 1sg. pronoun is strange — suppletive and clearly changing its structure between PIE proper and the common ancestor of the crown group (the Anatolian and Tocharian forms are different). Its ending does look conjugational — especially the variants *-om ~ *, as in thematic aorists and presents, respectively (as if we were dealing with a perfective and an imperfective personal pronoun). Perhaps it was somehow assimilated to 1sg. verbs, or indeed involved an obscured finite verb, but a verb like *h₁eǵ(h₂)- seems to be otherwise unknown. I’m not sure if the evidence of Vedic is sufficient to reconstruct the second laryngeal, but supposing that the *h₁e- part is a proclitic deictic pronoun, *-ǵh₂- could be the residue of a reduced verb root. Still, *ǵeh₂- doesn’t ring a bell either.

  48. OMG, there should be an “edit” button somewhere:

    a perfective and a perfective –> a perfective and an imperfective

    Sorry for this mess.

  49. I am happy to serve as your edit button.

  50. I am happy to serve as your edit button.

    For which I am immensely grateful. Still, I should spend more time thinking before I post. For example, I should not have said this:

    …the variants *-om ~ *-ō, as in thematic aorists and presents, respectively (as if we were dealing with a perfective and an imperfective personal pronoun).

    The variant *-om is in fact aspect-independent; the aorist and the imperfect shared it, so the contrast here is actually between non-present (whether perfective or imperfective) and present (obligatorily imperfective) inflections.

  51. David Marjanović says:

    I’ve spent half a day rereading this:

    Бабаев К.В. Ностратический личный показатель *q. Orientalia et Classica XIX. Труды Института восточных культур и античности. Аспекты компаративистики 3. Москва: РГГУ, 2008. С. 473-498.

    The pdf must have been on starling.rinet.ru at some point, or else I wouldn’t have it. It’s no longer there or anywhere else known to Google. I copied the citation from here, because – apart from its page numbers – the pdf gives no indication of belonging to any particular book or journal. (A “Chapter 1” is mentioned, but Babaev’s chapter isn’t numbered itself.)

    There is an English abstract, but it doesn’t contain any spoilers except for the hypothesis developed at the very end. So I’ll try to paraphrase the paper (in greatly shortened form and not entirely in the original order, because Babaev reviews a lot of literature):

    First Babaev takes the IE “I” word apart and lists all the problems with reconstructing it:

    – There are reflexes with *-m and reflexes without. In the latter there’s *-o-, which has been explained as the thematic vowel. As Piotr said, this looks like the thematic and athematic verb conjugations. However, while there are languages that have “proverbs” instead of pronouns, it’s strange that there’s no other trace of such a verb anywhere in IE. Alternatively, this *-m could be cognate with the possessive suffix (apparently 1sg; Babaev doesn’t say) that nouns get in the Anatolian languages. Either way, whether noun or verb, the word must have been independent in PIE, not some kind of clitic or affix.
    – Indo-Iranian and Balto-Slavic (how?) seem to point to *gʲʰ; Germanic, Latin and Greek point to *gʲ instead. Babaev agrees with some earlier works that these can be reconciled as *gʲH, a cluster with a laryngeal. (Bizarrely, and without further explanation, Babaev assumes that PIE only had a single laryngeal, which he writes *H. He mentions the “multi-laryngeal hypothesis” once, and that’s it. Fortunately, every *H in the paper seems to be *h₂, except for one or two in the explanation of a hypothesis he ends up not agreeing with.)
    – The other vowel was *e according to Greek, Latin and Germanic, but Lithuanian and Hittite instead point to *o (…or, I suppose, *a), and the Slavic reflex *azъ “demonstrates” or . These could all be the same thing with ablaut, though the meaning of this is “absolutely unclear” in this case; anyway, the function of this element was “probably” emphasis.
    – It is impossible to tell whether there was *h₁ at the very beginning of the word (except, Babaev doesn’t say, if you assume that PIE didn’t allow words to begin with a vowel).

    This leaves us with *e/o-gʲ-H-(m/oH). Of these four elements, *gʲ and *H are “stably reconstructed” and can be considered the root. One of these might be the actual person marker, and the other have some kind of auxiliary meaning (e.g. emphasis); or both might be person markers that have “contaminated each other”. During a page of literature review, Babaev finds Kortlandt’s idea that *gʲ indeed represents an emphatic particle, which survived elsewhere in IE:
    – Greek ἐμέ-γε (Aeolian -γα) “me” (acc.);
    – Germanic *mi-k “me” (acc.);
    – Armenian ինձ inʣ “to me” (dat.);
    – the Venetic oblique form meχo, “if not by analogy with eχo ‘I'”, a comment that should also belong to the Germanic form;
    – Hittite ammu-k compared to Luwian amu (meanings not given);
    – the Tocharian 1sg feminine emphatic particle -k (Toch. A, as in ñuk), -k(e) (Toch. B), though Babaev cites three alternative explanations for this;
    – Vasmer added to these the Slavic emphatic word že, Lithuanian betai-ga “but” and Old Prussian anga “if”. But Babaev doesn’t mention that these point to *g or , not to *gʲ!

    If we accept that this *gʲ had an emphatic function, things get interesting:

    Turkic has a “close parallel” in the Proto-Turkic word *ok “self”, which occurs as an independent word or as an enclitic in different languages:
    – Old Turkic bän ök “I (and nobody else)”, öz-üm ök “I myself”;
    – Kirgiz öz-um oq “I myself, only I”;
    – Yakut -oχ “self”;
    – Altai ol oq “he (emphasis)”.

    There’s a Mongolic emphatic particle *kü/gü.

    Uralic suggestions include Northern Mansi am-ki “I myself”.

    South Yukaghir adds emphasis to met “I” by turning it into mete-k̔, except I have no idea what the inverse apostrophe is meant to mean; the Yukaghir languages are not supposed to have aspirates or ejectives or anything.

    Bomhard has up the Kartvelian demonstrative pronouns ege “that”, igi “that one farther away”; Babaev does not comment on the fact that this fits – unless I’ve missed something big – neither Bomhard’s nor the Moscow School’s regular sound correspondences between IE and Kartvelian.

    There are “insufficiently clear” Dravidian forms in -k-: Kurux and Malto have eŋ-g- < *eṉ-k- “I”, Pengo has naŋ-g- < *naṉ-k- “I”. Apparently such forms are especially common in the dative, which makes sense, says Babaev, citing a source and not explaining further.

    In the Proto-Afro-Asiatic *(ʔan-)ʔaku, finally, both *ʔan- and -ku seem to be affixes, at least the former demonstrative.

    Thus, Babaev suggests that the *gʲ in *egʲHom was an emphatic particle and goes back to a Proto-Nostratic *k. The *H, then, should be the actual person marker; it would regularly come from Proto-Nostratic *q.

    PIE *H is indeed found in the 1sg endings of:
    – the stative (“perfect”), *Ha;
    – the “thematic” conjugation of the present, *oH;
    – the middle voice, *-H-;
    – the Anatolian ḫi-conjugation, -ḫi/-ḫa.

    But wait: in the stative, the *H isn’t a person marker, it’s the aspect marker:
    1sg *-H-o/e
    2sg *-t-H-o/e
    3sg *(-H)-o/e

    But that’s a strange paradigm for IE standards. Not marking the 1st person is strange, and putting the aspect marker behind the person marker (in the 2nd person) is strange, too. Perhaps, then, *H started as a person marker, was reinterpreted as an aspect marker (see below on why that would be fairly easy) and then trickled down the paradigm.

    In the ḫi-conjugation, the form -ḫa is widely acknowledged as older. It can then relatively effortlessly be derived from the “perfect”.
    The middle and the “perfect” are also somehow connected; among other things, the “perfect” doesn’t itself have a middle voice.
    The thematic ending evidently contains the thematic vowel *o/e, also found in other forms of non-perfect verbs.

    Summarizing a page and a half of literature review on what all this means, Babaev concludes “that the person marker with IE *H unites forms with meanings of stativity, inactivity, perfectivity and intransitivity – that is, it largely repeats the semantics of the […] personal pronoun *egʲHom.” That this word had such meanings in IE is not demonstrated further. Instead, Babaev looks outside IE for first-person verb markers that could be derived from Nostratic *q:

    In Uralic, there’s the Hungarian 1sg marker -k in the indefinite conjugation; with a definite object, -m is used instead. Several hypotheses have been proposed on where this -k could come from, all requiring rather strange reanalyses. The Samoyedic Selkup language uses -k (and -m) apparently the same way as Hungarian (qoŋa-k “I know (myself/us)”) and also forms “predicative nouns” with -k (kum-ak “I am a person”). As a verb ending it further occurs (as -g) in the Permic languages, apparently with the same meaning again. Further, in the 3rd person, the definite conjugation takes an ending, and the indefinite one does not; this is typologically common for stative verbs, giving the impression that this *-k once had a stative meaning.

    In Turkic there’s a 1st-person marker -k in the “preterit”/”aorist”. Babaev cites a reference for considering it common Turkic despite its absence in Old Turkic texts and Siberian “dialects” and then spends over a page on reviewing hypotheses of its origin, concluding it’s ancient as a 1pl marker at least.

    In Dravidian languages “a whole list of interesting but, unfortunately, so far underresearched elements” are found that resemble the Uralic and Turkic *-k. In Brahui, the oblique stem of the 1st-person pronouns is kan-, which is not found anywhere else in the family. Further, the clitic 1st-person possessive pronoun of Brahui is -ka. This k might or might not come from *y; the historical phonology of Brahui is not well understood. Anyway, a Proto-Dravidian 1sg present/future verb ending *-N-ku and a 1pl exclusive counterpart *-N-kum have been reconstructed.

    Chukchi has a 1st-person ending -k for intransitive verbs. Interestingly, Babaev’s only source is from 1922.

    Like Chukchi, the Eskimo-Aleut languages have ergative-absolutive alignment. They put the 1sg possessive suffix -ma on the agent (ergative) and -ka on the patient/experiencer (absolutive), and -ka apparently also goes on transitive verbs to mark the 1sg patient/experiencer. (I’m piecing this together from two widely separated paragraphs.) The Sirenik language has a 1pl patient/experiencer marker -ki.

    Elamite had a 1pl exclusive pronoun nuku, “the morphemic status of which is unclear”. However, -k shows up in noun conjugation (…Elamite is one of about two languages that is known to conjugate nouns) as the 1sg suffix. From Middle to Achaemenid Elamite, this suffix spread from noun conjugation to verb conjugation in a perfective-preterite meaning; Babaev is reminded of Turkic.

    The *k of Uralic, Altaic, Dravidian, Eskimo-Aleut and Chukchi-Kamchatkan represents a merger of Proto-Nostratic *k and *q (and, in some of these branches, yet other things). The litmus test are IE and Kartvelian: PN *k should give PIE *gʲ/g/gʷ and PK *k, while PN *q should yield PIE *H and PK *q or possibly . Babaev spends over a page on discussing a possible Kartvelian candidate with and concluding that it doesn’t fit. This leaves IE, and thus *q anyway.

    This *q wasn’t simply a 1sg marker, but had “stative”, “intransitive”, “absolutive” meanings.

  52. David Marjanović says:

    I should have mentioned that, in the same year, Babaev published a 300-page book on the origins of the IE person markers:

    Бабаев К.В. Происхождение индоевропейских показателей лица. Москва – Калуга: «Эйдос». 2008. 298 с. ISBN 978-5-902948-30-8.

  53. the “thematic” conjugation of the present, *oH

    Almost everybody accepts it, but I have my doubts. Eugen Hill has recently (2012) come up with a pretty solid argument vindicating Warren Cowgill’s old hypothesis that 1sg. * comes from *-ōi, which reflects the regular development of *-omi (or rather *-omj, with a non-syllabic glide) in this position (in the final syllable, preceded by a non-high vowel). It may seem a crazy idea, but it explains some otherwise mysterious phenomena in PIE morphology, and in my opinion should be taken seriously. It seems strange, anyway, for *h₂ to have spread to the active thematic present, leaving *-om alone in the parallel non-present 1sg. ending (especially in the imperfect, which is basically the very same thing as the present, modulo tense markers).

  54. Thanks for the exhaustive and useful summary, David, and for the hat-tip to my erstwhile diss director, Piotr!

  55. David Marjanović says:

    Oh yes – I read Hill’s paper on Google Books or somewhere at some point, but forgot all bibliographic information about it. For what that’s worth, I agree it’s a really good idea.

  56. David Marjanović says:

    …and not just the bibliographic information, obviously!

  57. Thus, Babaev suggests that the *gʲ in *egʲHom was an emphatic particle and goes back to a Proto-Nostratic *k. The *H, then, should be the actual person marker

    Isn’t it a problem that, in all the cited comparanda both in and out of IE, the emphatic particle follows the person marker, while just in the PIE form it’s supposed to be preceding it?

  58. George Gibbard says:

    What does conjugating nouns in Elamite mean, and is it Akkadian that is the only other language that does it? I’m thinking of šarr-āku ‘I am king’; the odd thing about the Akkadian construction is not that the copula is a clitic, but that (I think) this is only attested with unmodified nouns/adjectives, not with complex noun phrases. So is this also true for Elamite? (In the rest of Semitic what is called the “Perfect” is derived from this construction with some sort of participle, but the construction doesn’t otherwise survive.)

  59. marie-lucie says:

    Elamite is one of about two languages that is known to conjugate nouns

    I know next to nothing about Elamite except its name and the approximate region where it was spoken, but “conjugating nouns” suggests to me that nouns could be sentence predicates just like verbs, something that is not too unusual in languages which do without a “copula”.

  60. David Marjanović says:

    Isn’t it a problem that, in all the cited comparanda both in and out of IE, the emphatic particle follows the person marker, while just in the PIE form it’s supposed to be preceding it?

    Perhaps. But then, the other emphatic particle – *(h₁)e- – also precedes it.

    What does conjugating nouns in Elamite mean, and is it Akkadian that is the only other language that does it?

    Not Akkadian, but extant Nama; of the Wikipedias I’ve checked, only the Russian one describes it.

    Elamite examples:
    sunkik “I, the king”
    sunkit “you, the king”
    sunkir “(someone else,) the king”
    sunkip “(others, the) kings”
    u sunkik Ḫatamtik “I, king of Elam”

    u untaš-GAL šak ḫumpanummenake sunkik anzanšušunka
    I Untas-大 son Ḫ.-1sg king-1sg Anzan-Susa-1sg
    “I, the Great Untas, son of Ḫ., king of Anshan and Susa”

  61. George Gibbard says:

    Far out! I’m glad I asked.

  62. I love seeing Sumerian gal ‘great’ turning up all over the Middle East, whatever the language, much as basic Arabic words have infiltrated the same region today. Ah, memories of grad-school LÚ.GAL GAL!

  63. On the subject of the Indo-European first person singular nominative pronoun, I will admit I found Eric Hamp’s analysis very persuasive, not least because his analysis explains why it failed to survive in Celtic:

    http://kuscholarworks.ku.edu/bitstream/handle/1808/8573/SCN%201_2011_Hamp.pdf?sequence=1

  64. marie-lucie says:

    u untaš-GAL šak ḫumpanummenake sunkik anzanšušunka
    I Untas-大 son Ḫ.-1sg king-1sg Anzan-Susa-1sg
    “I, the Great Untas, son of Ḫ., king of Anshan and Susa”

    Could this mean literally: I … son of H I am, king I am, (of) A and S I am ??

  65. David Marjanović says:

    I love seeing Sumerian gal ‘great’ turning up all over the Middle East, whatever the language, much as basic Arabic words have infiltrated the same region today.

    Well. Sumerograms stand around in non-Sumerian texts like kanji in Japanese texts; there’s often no telling if they stood for loanwords or for native words with the same meaning. The Hittite word for “bread” was apparently only discovered a few years ago because the Hittites just about always wrote it with a sumerogram.

    Could this mean literally:

    That doesn’t work for u sunkik Ḫatamtik “I, king of Elam”.

    The English Wikipedia has a bit more on this (also scroll down to “Language samples”).

    The French Wikipedia, it turns out, has a much better explanation!

  66. Trond Engen says:

    Cliticification of a conjugated copula does seem likely. How does tha Nama conjugation work?

    Maybe related, it struck me that the repetitious use of the first person ending might be a matter of style or register. A royal proclamation was something that was shouted out from … wherever Elamite herolds went to shout.

  67. Clitic copulas are possible in Polish (though rare these days in formal, standard language). They can still be used with adjectives, but in older Polish they could also be attached to nouns. Etymologically, they are reduced variants of the common Slavic conjugational forms of ‘to be’, but synchronically they are analysed as “mobile” verb endings. They used to behave in accordance with Wackernagel’s Law: like other “sentence particles”, they were placed after the first stressed word in a clause. In present-day Polish in most types of sentences they don’t leave the main verb.

    Formal: Jesteś głupi ‘you_are-2SG. stupid’.
    Colloquial (or dated formal): Głupiś ‘stupid-2SG.’ (from Old Polish głup(i) jeś).

  68. Henry the Eighth I am!

  69. “The lugal Eannatum is come! Let all leave this land or yield them up!” It gave the enemy more thought if the heralds used that name.

  70. David Marjanović says:

    How does tha Nama conjugation work?

    At the link above, the Russian Wikipedia says that person, gender and marker are expressed in a single element that is usually called a clitic and goes between the root and the case marker. Then it presents the table of these so-called clitics. And then it gives three examples, one of which fits the table; another fits almost, and the third just doesn’t. ~:-|

    I’m now reading Hamp’s paper. “So far as I can see, there is one approach — the correct approach — that has not yet been seriously considered.” 😀 Something tells me this paper wasn’t peer-reviewed!

  71. David Marjanović says:

    The paper does seem to solve a bunch of riddles. But I wonder what would have happened to *egʲH in the branches where Hamp reconstructs just *egʲ. Are there even any other cases of word-final clusters of plosive + laryngeal? And concerning specifically Germanic, what about the early round of apocope that e.g. robbed the vocative of its ending? – I’ll need to see how much of Ringe (1996) Google Books lets me read this time. Usually it’s not much.

  72. Are there even any other cases of word-final clusters of plosive + laryngeal?

    One particularly good case is the athematic adjective meaning ‘big’, whose reconstructed neuter form is *meǵh₂ (Gk. μεγά, Ved. máhi, OHitt. mēk). Unfortunately the original stem has been replaced by a more complex thematic derivarive in Germanic (*mek-ila- ≈ Gk. μεγάλος), and Celtic *magjo- may not even be the same root.

  73. David Marjanović says:

    Ringe (1996)

    2006 of course.

    OHitt. mēk

    This at least fits ūk “I” (the vowel of which, Hamp thinks, is somehow copied from ).

  74. This at least fits ūk “I” (the vowel of which, Hamp thinks, is somehow copied from tū).

    Kloekhorst, pp. 135-143, esp. 139 ff., argues against that idea. Based on an analysis of the Anatolian forms against what is attested in Rest-IE, he reconstructs this System for PIE:
    1. Sg. Nom. H1ég’H-, obl. H1mn-; 2. Sg. Nom. tiH1, obl. tu-
    The reason for the (from a conventional IEanist view) strange 2Sg. Nom. are the Anatolian forms (Hittite zik, something like ti or ti: in the other Anatolian languages). Kloekhorst generally (and not really controversially) assumes a clear split between Anatolian and Non-Anatolian PIE, and he here assumes that the Anatolian forms are archaic, while Rest-IE has imported the /u/ from the oblique cases into the nominative. Hittite, OTOH, has first imported the /u/ from the 2 sg. oblique into the 1st sg. oblique, and from there it wandered into the 1st. sg. Nom. I find his reconstruction quite appealing.

  75. That was supposed to be “Kloekhorst, p. 135-143” and “Hittite”. Sorry!

  76. David Marjanović: If you will contact me via e-mail I will put you in touch with something you will like.

  77. It’s no longer there or anywhere else known to Google

    Babaev’s paper (in fact the entire volume) is still accessible online [linked from the website of the Институт восточных культур и античности]

  78. I will put you in touch with something you will like

    I think you mean “you will hear something to your advantage”.

  79. In the linked article, Eric Hamp sometimes tries to eat the cake and have it too. He wants the pronoun to end in *ǵ, he is convinced in advance that the “North European” branches preserved the bare nom.sg. without any extensions; therefore inconvenient forms like Slavic *jazъ have to be explained away. An Iranian loan? — the 1sg personal pronoun, really? Anything can be borrowed, but this is one of the least borrowable items cross-linguistically, so one has a right to demand extraordinarily solid evidence. The Slavic long vowel reflex, to begin with, is not really explained: there’s no independent support for the ad hoc suggestion that the fronted *a in Pontic Steppe Iranian sounded longisch to a Slavic ear (which is supposed to be why *ě ~ ja was substituted for it). Where are independent examples of such an outcome in Slavic? And what about Elder Runic eka, ika beside ek, ik? The final vowel in emphatic forms may even have been long in PGmc. (it survives e.g. in OHG ihha and West Frisian ikke). Hamp should at least have mentioned such complications.

  80. David Marjanović says:

    …Thanks for the links; I shall never be bored again. *wide-eyed look*

    Nonetheless, I’ve sent the e-mail. 🙂

    An Iranian loan? — the 1sg personal pronoun, really? Anything can be borrowed, but this is one of the least borrowable items cross-linguistically, so one has a right to demand extraordinarily solid evidence.

    I can see borrowing the 1sg personal pronoun for extreme emphasis; English has sort of jokingly done it with moi. …But I do agree that establishing the mere theoretical possibility isn’t the same as showing that this is what happened.

    The Slavic long vowel reflex, to begin with, is not really explained: there’s no independent support for the ad hoc suggestion that the fronted *a in Pontic Steppe Iranian sounded longisch to a Slavic ear (which is supposed to be why *ě ~ ja was substituted for it).

    On this detail he could have been right for the wrong reason: if the quality of was [æ], then it may well have been an exact match for that of Iranian short *a (indeed, a and ā are [æ] and [ɒ(ː)] in modern Persian in Iran today; vowel length is being lost).

    Elder Runic eka, ika […] OHG ihha

    …Oh.

    *pretends being able to raise one eyebrow*
    Fascinating.

    West Frisian ikke

    This -e is also part of the stereotype about the accent of Berlin. I’m not sure if I’ve personally heard it (I have heard [ʔɪk] often), but…

  81. On this detail he could have been right for the wrong reason: if the quality of *ě was [æ], then it may well have been an exact match for that of Iranian short *a (indeed, a and ā are [æ] and [ɒ(ː)] in modern Persian in Iran today; vowel length is being lost).

    Well, there are fairly long lists of suspected Iranicisms in Slavic, and of course many of them have short *a on the Iranian side. The Slavic counterpart is almost invariably *o (which was phonetically [a] in Proto-Slavic, in all likelihood), but sometimes also *ъ, possibly depending on the stress pattern.

    If we start with *h₁eǵ(h₂)om, however, we get *(j)azъ as the expected regular reflex (with length due to Winter’s Law). The variant *ja can be explained as the outcome of apocope — a trivial thing to happen in a personal pronoun (function words are hotspots of phonetic reduction). The Runic forms are usually explained as stressed *ek-a(n) vs. weak *ik (with very early apocope — practically the same thing as in Slavic), with some mutual contamination leading to “mixed” variants. A heavier suffix (*ek-ō?) is posited by Ringe to account for some of the Germanic reflexes, but we need *ek-a(n) anyway to explain e-breaking in East Scandinavian (Old Swedish iak, etc.). All this is completely straightforward and requires no special pleading.

  82. marie-lucie says:

    borrowing the 1st singular pronoun

    It is true that personal pronouns are rarely borrowed, but in the cases considered here the “personal pronoun” is an independent word or even phrase while (what seem to me to be) the true pronouns are the pronominal affixes.

    A speaker often feels the need to stress their own participation or responsibility in the act or situation described in their utterances, hence the frequent occurrence of an emphatic independent pronoun such as French moi often duplicating the reference shown by the personal affix on the verb. On the other hand, politeness or diplomacy often causes the speaker to try to avoid attracting too much attention to their own participation, so it should not be surprising that there are often circumlocutions (and possibly eventually replacements) for the 1st sg morpheme, which may become “pronouns” if the old morpheme is lost (eg Fr obligatory je from Latin emphatic ego, itself apparently from a PIE phrase).

    Circumlocutions often treat the speaker as a 3rd person, as in English this writer, yours truly, your humble servant and others. In French you can refer to yourself as moi qui vous parle (often used in recounting an anecdote from the past which the hearer might not believe, usually one that involves the speaker as participant or witness in the same way as others who were in the same situation then, who are not present now) or in relatively formal writing l’auteur de ces lignes. Another case for which I don’t know examples in other languages is the very colloquial self-reference using the apparent name or nickname Bibi (with 3rd person agreement as with other names). It seems to me that I have heard it only or mostly in the context of complaining of unfair treatment which should have been shared by a group: for instance, if you and your friends were together at a restaurant and the others disappeared before paying their due so you were left to take care of the entire bill: Qui est-ce qui a payé? Bibi! “Who was it who paid? – Bibi!. Similarly if your group scatters at the sight of the police arriving and you are the only one arrested.

    I don’t know where Bibi comes from. (It is much older than the current Israeli president).

  83. (It is much older than the current Israeli president)

    Older by far: http://www.cnrtl.fr/definition/bibi

  84. David Marjanović says:

    with length due to Winter’s Law

    *lightbulb moment*

    Also, I think this is a “how stupid of me not to have thought of this myself” moment.

    Older by far:

    Aw. I was hoping for an obscurum per obscurius connection with biloute (explained in Bienvenue chez les Ch’tis as “ch’est le churnom à tout l’monde”)

  85. marie-lucie says:

    Piotr, merci!

    According to the TLFI (the source of your link), the citations for “Bibi = moi” are late 19C. One of Daudet’s characters uses it, and the author feels obliged to explain that this character likes to “call himself” by that name (it is not a name that others call him). This suggests that it was a recent innovation. Alternately, since Daudet was a Southerner, probably bilingual in French and Occitan since most of the South spoke Occitan at the time, it is possible that he only encountered the usage after living among “Parisians” (= Northerners). (“Bibi” as the nickname of an actual person in Marseille two centuries before seems to be a coincidence).

  86. I don’t know where Bibi comes from. (It is much older than the current Israeli president).

    Benjamin Netanyahu is the current Israeli prime minister.

  87. marie-lucie says:

    PO, sorry for the error, thanks for the correction.

  88. David Marjanović says:

    Way above I wrote:

    During a page of literature review, Babaev finds Kortlandt’s idea that *gʲ indeed represents an emphatic particle, which survived elsewhere in IE:
    – Greek ἐμέ-γε (Aeolian -γα) “me” (acc.);
    – Germanic *mi-k “me” (acc.);
    – Armenian ինձ inʣ “to me” (dat.);
    – the Venetic oblique form meχo, “if not by analogy with eχo ‘I’”, a comment that should also belong to the Germanic form;

    Perhaps surprisingly, Ringe (2006: 124) agrees on the equation of at least the Germanic and the Greek forms:

    PIE ⋆m̥(m)é ge ‘me!’ (with enclitic emphasizing particle, cf. Gk ἐμέγε /emége/) ↣
    PGmc acc. *mek, unstressed *mik ‘me’ (cf. Anglian OE mec but ON mik, OHG mih).

  89. l’auteur de ces lignes

    English the present author, a curious metaphor (says Frye) because from the reader’s point of view the author is in fact absent.

    Qui est-ce qui a payé? Bibi!

    I would say this as “And who got stuck with the check? Yours truly!” The use of this standard closing phrase to a letter as a jocular first person pronoun is decidedly strange: the OED’s first recorded use is by Dickens in 1833. But there it is.

  90. David Marjanović says:

    Less strange if you consider its predecessors as closing phrases to letters, like your humble servant.

    because from the reader’s point of view the author is in fact absent

    I’ve always understood this present as referring to time.

  91. Yeah, but when someone writes a book in 1848 and calls himself “the present author”, he is present neither in time nor in space when I read the book today. “The present book” makes perfectly good sense, on the other hand; perhaps the word was transferred from that expression.

  92. David Marjanović says:

    Updates 3 years in the making…

    (Bizarrely, and without further explanation, Babaev assumes that PIE only had a single laryngeal, which he writes *H. He mentions the “multi-laryngeal hypothesis” once, and that’s it. Fortunately, every *H in the paper seems to be *h₂, except for one or two in the explanation of a hypothesis he ends up not agreeing with.)

    The reason seems to be that the Moscow School regards every laryngeal that isn’t spelled out in Anatolian as belonging to internal reconstruction of PIE, thus not a part of PIE proper. This is arguably true of some of them, but certainly not all.

    South Yukaghir adds emphasis to met “I” by turning it into mete-k̔, except I have no idea what the inverse apostrophe is meant to mean; the Yukaghir languages are not supposed to have aspirates or ejectives or anything.

    If the superscript should be a subscript, we’re looking at [q] in the Uralic Phonetic Alphabet.

    In the ḫi-conjugation, the form –ḫa is widely acknowledged as older.

    Specifically, -ḫi is considered a contraction of -ḫa-i.

    Cliticification of a conjugated copula does seem likely. […]

    Maybe related, it struck me that the repetitious use of the first person ending might be a matter of style or register. A royal proclamation was something that was shouted out from … wherever Elamite herolds went to shout.

    The Wikipedia article already linked to presents examples that do suggest clitics (“sunki Hatamti-p (or, sometimes, sunki-p Hatamti-p) = ‘the kings of Elam'”; “temti riša-r = ‘great lord’ (lit. ‘lord great’)”), but it’s also clear that noun (phrase) conjugation was grammaticalized and was used with together with rather than instead of verbs: “I, name, son [of] name-1sg, king-1sg place place-1sg, bricks molded-1sg and throne_hall-acc [of] name god-3sg I-3sg-acc ?with? made-1sg-loc?”.

    OHG ihha and West Frisian ikke

    A few years ago I heard a little girl in Berlin insist aber ich! – as [ʔabaʔɪçːɵː]. Spurious final vowels like that, added just for prosodic purposes, are found in French, but hardly in German.

    A heavier suffix (*ek-ō?) is posited by Ringe to account for some of the Germanic reflexes, but we need *ek-a(n) anyway to explain e-breaking in East Scandinavian (Old Swedish iak, etc.). All this is completely straightforward and requires no special pleading.

    In sum, it looks like the stressed PGmc form was *eką, straight from *h₁eǵh₂om; maybe there was also an *ekō from *h₁eǵoh₂ without the third IE emphatic particle *-m.

    ==========

    Finally, Hill’s paper on the thematic 1sg *-ō and other things is here.

  93. David Eddyshaw says:

    Is this K V Babaev the same bod as the praiseworthy Africanist Kirill? He seems to have made a fairly radical change in his interests since then if so (but I think Kirill B is off to SE Asia nowadays, come to that, so maybe he just gets bored after a bit and moves on.)

    Nahuatl conjugates nouns, of course, not only in that nouns take verbal morphology instead of appearing with a copula, but in that nouns are always marked for the person of the referent. (Michel Launey called this “omnipredicativity”, and wrote a book about it.) I sort-of assumed Nahuatl was the other one of the “about two.”

    According to The Elamite Language, by Erica Reiner (Brill 1966), Elamite does this with animate-reference nouns: sunkik “I, the king”, sunkir “he, the king”, sunkime “kingdom.”

    But as marie-lucie pointed out a few years back in this thread, regularly marking person-reference nouns for the person of the referent even when they’re arguments and not predicates isn’t unparalleled by any means. Iroquoian languages do it, for example.

  94. David Marjanović says:

    Is this K V Babaev the same bod as the praiseworthy Africanist Kirill?

    Yes.

    The Wikipedia article on Nahuatl only offers this:

    It has been argued that Classical Nahuatl syntax is best characterised by “omnipredicativity”, meaning that any noun or verb in the language is in fact a full predicative sentence.[101] A radical interpretation of Nahuatl syntactic typology, this nonetheless seems to account for some of the language’s peculiarities, for example, why nouns must also carry the same agreement prefixes as verbs, and why predicates do not require any noun phrases to function as their arguments. For example, the verbal form tzahtzi means “he/she/it shouts”, and with the second person prefix titzahtzi it means “you shout”. Nouns are inflected in the same way: the noun “conētl” means not just “child”, but also “it is a child”, and ticonētl means “you are a child”. This prompts the omnipredicative interpretation, which posits that all nouns are also predicates. According to this interpretation a phrase such as tzahtzi in conētl should not be interpreted as meaning just “the child screams” but, rather, “it screams, (the one that) is a child”.[102]

    That’s not very similar to Elamite.

    I haven’t found any hint yet in the Wikipedia grammars of Iroquoian languages, but all of them are very short.

  95. David Eddyshaw says:

    That’s not very similar to Elamite.

    Launey’s Nahuatl grammar has (26.2) “Noms à la 1e et 2e personne”, examples like

    Tlein ticcuāzquê in ticnōtlācâ
    “Qu’allons-nous manger, nous pauvres gens?”

    where the ti- “1pl” appears on the verb and the subject. It looks pretty like the Elamite to me.

    The particle in doesn’t seem to me to be fully explained, but it certainly isn’t any sort of relative marker, as it appears all the time with bare nouns with 3rd person reference in their citation forms; unless you assume (as Launey actually does) that Nahuatl 3rd-person-reference-noun bare citation forms actually are predicates by default.

  96. David Eddyshaw says:

    Arthur Anderson (the other major modern grammarian of Nahuatl) really goes a bundle on this business of all Nahuatl nouns being predicates by default, to the point where he writes it in to every single gloss, including glosses of single inanimate nouns in isolation. After a while, you end up going “OK! I get it! All Nahuatl nouns are sentences. Stop! Just stop already!”

    Surely the Elamite facts lend themselves to just the same interpretation?

  97. David Marjanović says:

    I think you mean that Nahuatl lacks a distinction between nouns and verbs? Elamite has a pretty strong one. Case suffixes and (at least from Middle Elamite onward) possessive suffixes, which are different from the person-marking suffixes, go on nouns but not on verbs. Verbs can take TAM markers and can be nominalized; one set of TAM is marked by person endings different from those of nouns (and the others are periphrastic, involving participles that conjugate as nouns). Nouns are not marked for TAM. Word order is pretty strict, very much unlike Nahuatl, and personal pronouns are used all the time despite the presence of person markers on half the words in any sentence! (The “resumptive pronouns” in front of the conjugated verb remind me of Yeniseian.) In short, Elamite nouns are not verbs, and yet they’re conjugated.

    What I can find on Iroquoian grammars on Wikipedia doesn’t tell if nouns are predicates. There is a robust noun-verb distinction, however, even though free-standing nouns aren’t used much; the word order seems to be “new information first”.

  98. David Eddyshaw says:

    Nahuatl doesn’t lack a distinction between nouns and verbs at all; they are morphologically very clearly distinct. What is unusual about it is precisely that nouns and verbs are both freely used both as predicates and as arguments.

    From Montgomery-Anderson’s Cherokee grammar: human-reference nouns include a prefix for the person of the referent: jisgaya “I’m a man”, hisgaya, “you’re a man”, asgaya “man” or “he’s a man.” I can’t see from his account whether you can do the Nahuatl thing of having a 1sg-marked noun as subject along with a 1sg-subjetc verb; I think part of the difficulty is that for pragmatic reasons it’s likely to be uncommon in any case.

  99. David Eddyshaw says:

    This looks apropos, and discusses some of the relevant issues:

    http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2014/hahn.pdf

  100. David Marjanović says:

    human-reference nouns include a prefix for the person of the referent: jisgaya “I’m a man”, hisgaya, “you’re a man”, asgaya “man” or “he’s a man.”

    That’s widespread; türküm “I am a Turk”. But, as far as I can see, Elamite didn’t stop there. “I, the king-1sg, did-1sg stuff” apparently cannot be parsed as “I am the king and did stuff”, because there’s a word for “and” that is used just like in SAE in the one clear example on the Wikipedia article and is not used in this construction. (Unexpectedly, the generally more detailed German version doesn’t elaborate.)

    I can’t see from his account whether you can do the Nahuatl thing of having a 1sg-marked noun as subject along with a 1sg-subjetc verb; I think part of the difficulty is that for pragmatic reasons it’s likely to be uncommon in any case.

    In Elamite it was obligatory, and then people piled a pronoun on top of that.

  101. David Eddyshaw says:

    It seems likely to me that given the rarity of compulsory marking for person of the referent on nouns that when it turns up it’s likely to be a manifestation of “omnipredicativity”, but I must admit the notions are logically separable.

    I rather like the idea of conflating person and gender. This is actually quite a tempting move in a number of Niger-Congo languages. In Kusaal, although I followed the crowd in labelling the gender distinction as animate/inanimate, as a matter of fact “animate” 3rd person pronouns are used precisely in those cases where it’s possible to conceive of 1st or 2nd person pronouns being applied: “animate” should really be “potential participant in a conversation” (which isn’t identical in the Kusaasi Weltanschauung to what it would be in SAE.) In the related languages which still preserve lots of different “inanimate” genders with separate pronouns, you could often quite easily and naturally have a supercategory in which “second person” is a gender, and from a morphological point of view have it all work out very neatly. But it would come at the cost of “animate”-reference nouns varying between three “genders” promiscuously, so you’d just be sweeping the complexity under the carpet.

  102. David Marjanović says:

    This looks apropos, and discusses some of the relevant issues:

    It’s definitely interesting, I’ll read it in detail at… some point. 🙁

  103. David Eddyshaw says:

    That’s widespread; türküm “I am a Turk”.

    Sure. But is Türk. by itself, the normal way to say “He is a Turk.” (as opposed to replying to a question like “What nationality is he?”, where ellipsis comes into play?) Otherwise, there seems to be no reason to suppose that Türk when used as an argument is anything more exotic than “Turk” as an argument in English, and no reason to suppose that there’s any compulsory person-marking of nouns in Turkish. Admittedly the fact that asgaya means both “man” and “he’s a man” doesn’t prove that Cherokee has that either; but it seems a sine qua non for it to be possible. To prove the point, as you rightly imply, you’d need to find cases of coreference between person-marked subject nouns and verbs. I think that is likely to be a highly marked and uncommon situation with 1st persons outside of the particular genre of royal self-laudatory inscriptions. (One of Launey’s other examples is actually In NiMotēuczōma, ca Mexìco nitlàtoāni, “Moi, Moctezuma, je sui rois de Mexico”, from a person with the same mindset as the Elamite author …)

    Anyhow, if Launey and Anderson are correct, person-marking on nouns for the person of the referent is compulsory in Nahuatl, just as (apparently) in Elamite.

  104. John Cowan says:

    In the deutero-canonical book of 2 Esdras (4 Esdras to the Catholics and 3 Esdras to the Orthodox, both of whom call the books of Ezra and Nehemiah “1 Esdras” and “2 Esdras”), the opening of book 3 is (in the Common English Bible of 2008-11) “In the thirtieth year after our city [Jerusalem] was destroyed, I, Salathiel, who am also Ezra, was in Babylon.” That “who am” is decidedly weird, but a number of translations use it. The Latin recension, which is the oldest we have (but internal evidence shows that there were Hebrew and Greek versions, as 2 Esdras 3-14 are a Jewish apocalypse), reads “Anno tricesimo ruinae civitatis eram in Babylone, ego Salathihel qui et Ezras”. How weird that is in Latin I can’t say.

  105. You corrected yourself to 1 Esdras back in 2012.

  106. David Eddyshaw says:

    Surely it’s no weirder than the familiar “Our Father, who art in heaven”?

    Don’t know if that’s a Latinism in the English, but person agreement of the relative is standard Latin. No verb there at all in the Greek, of course.

    The omission of the copula altogether in the Esdras bit seems a pretty peculiar Latin usage to me, but my Latin Sprachgefühl isn’t what it was a couple of thousand years ago. I blame Barbarians.

    Addendum: idly googling, I found pretty much straight away Acts 13:9

    Saulus autem qui et Paulus repletus Spiritu Sancto intuens in eum

    … so what do I know? It’s probably standard Latin for AKA.

  107. Quoth Ren (6m36): “They think I’m crazy. But I know better… it is not I who am crazy; it is I who AM MAD!”

  108. David Marjanović says:

    I don’t know what we have in Elamite other than royal self-laudatory inscriptions, prayers and bureaucracy.

    In NiMotēuczōma

    That’s interesting, especially because in is… apparently some kind of demonstrative, in any case not a personal pronoun.

    But is Türk. by itself, the normal way to say “He is a Turk.”

    …To my surprise, that seems to be possible. Another option, mentioned only in the “vowel harmony” section of the “Turkish language” article, appears to be a copula clitic: türkdür… though maybe that implies definiteness.

  109. David Marjanović says:

    Omission of the copula is unusual in Latin, but found, e.g. et in Arcadia ego.

  110. David Eddyshaw says:

    Looking at Lewis’ Turkish grammar, he says that the copula is suffixed -dir “in writing and in formal speech” but zero in informal speech in A = B sentences like Kızın adı Fatma “The girl’s name is Fatima.” But he says that if the subject is omitted, even the colloquial uses -dir: yaman bir adam-dır “he is a remarkable man”; or that you use the 3rd sg pronoun, as in o, yaman bir adam or yaman bir adam, o.

    However, this is from the 1967 edition. Turkish might have got all omnipredicative since then for all I know.

    What struck me as odd was not the omission of the copula in Latin as such, but its omission in a relative clause. But I seem to have been wrong about that anyway.

  111. David Eddyshaw says:

    I’m not altogether sold on what seems to be the consensus view of Nahuatl being omnipredicative (but my knowledge is minuscule compared with that of Launey, Anderson etc etc.) Still, FWIW, it doesn’t really seem to be the case that you would actually say (say) Cuahuitl, just like that, for “It’s a tree”; you’d need to say Ca cuahuitl, where ca marks something as a predicate, though it clearly isn’t itself a copula. There doesn’t seem to be any similar restriction against saying (say) Tzàtzi in pilli “The child is shouting”, although you can say Ca tzàtzi in pilli You could maybe argue that this isn’t a world away from the situation in Turkish.

    It seems about analogous to the (different) question of whether there are languages which make no grammatical distinction between nouns and verbs; the candidates seem to melt away under close examination, but nevertheless it’s pretty clear that there really are languages where the distinction is at very least a whole lot more subtle than in SAE.

  112. David Eddyshaw says:

    In other words, it looks more like the position in Nahuatl is not that nouns and verbs are really syntactically equivalent, but that while both nouns and verbs can be used (remarkably) freely as predicates or as arguments, nouns still default to being arguments and verbs to being predicates. (Duh.)

    Which is in principle orthogonal to the question of whether Nahuatl nouns are obligatorily marked for the person of the referent; it looks like it, but the constructions that might decide the issue are pretty marked and peculiar pragmatically*, to the extent that a less outré analysis might even be more parsimonious. (Anderson gets pretty shirty about this, implying that any such attempts must be ipso facto mere glottocentricism if not positively racist. It’s a point of view …)

    *I mean, how often do you really say: “I, the X, do thus”? Unless you’re the God-Emperor of Mankind and like to keep reminding the peasantry of the fact on the off-chance that they might have forgotten.

  113. John Cowan says:

    No, 2 Esdras it is. I added a correction to the miscorrection at the other post. 1 Esdras is basically a separate recension of Ezra.

  114. David Eddyshaw says:

    Quoth Ren

    Oh, that Ren.
    (Well, it could have been Kylo. I can picture it.)

  115. I took a look at Andrews, Introduction to Classical Nahuatl, and Bierhorst, A Nahuatl-English dictionary and concordance to the Cantares Mexicanos. That in particle is, roughly, a demonstrative. Molina, the first Nahuatl grammarian, called it an “adornment”. Andrews sees it as an optional honorific. Bierhorst draws a parallel with Richard II: “This royal throne of kings, this scepter’d isle, this earth of majesty, this seat of Mars…” Bierhorst’s grammatical supplement, in general, makes for delightful reading. He’s constantly thinking of the grammar from the translator’s point of view.

    Another example in Bierhorst: quinõyacuili ynin tepoztopili ixpayolme, freely translated as ‘this one has taken the lance from the Spaniards’, literally ‘this one has taken it from them, it is lance, they are Spaniards.’ This kind of construction reminds me of the syntax of some sign languages (if you agree with his analysis.)

    Paging Magnus Pharao Hansen…

  116. David Eddyshaw says:

    Whatever else it does, in marks the word it’s attached to as an argument, rather than a predicate. It can’t just be that, because it’s not mandatory with arguments, but even so, that seems to be the single clearest thing about it. It’s the opposite (in that sense) of ca, which flags up the next word as a predicate instead of an argument. The fact that there even are such words in Nahuatl shows there’s a lot more going on than just all words being predicates. And Bierhorst’s rendering of quinõyacuili ynin tepoztopili ixpayolme “literally” is frankly just silly. I’d say it shows major conceptual confusion (happily, this is not at all a barrier to sensitive understanding of the actual language in practice.)

    The whole area of usage of these and similar words in Classical Nahuatl was evidently tied up with all kinds of issues of focus and backgrounding and textual cohesion which are the devil to pin down even in a language where you’ve got real live informants to share their intuitions with you. Even in English, the most intensively studied language of all time, we don’t understand them at all well. Doing it in a dead language ranges from difficult to impossible; but people make progress anyhow (the heroes of Egyptian grammatical study have done near-miracles with even more unpromising materials.)

  117. If we are going to get into fictional characters named “Ren,” the most linguistically interesting is probably the prince of Octopon from The Pirates of Dark Water. The show was notable for its time for being a cartoon with an actual progressing plot arc, as well as for its meticulous world building, including a whole range of fictional profanity.

  118. David Eddyshaw says:

    If we are going to get into fictional characters named “Ren”

    Yes, please. Also, who does not like fictional profanity? Examples?

  119. @David Eddyshaw: The entire series can be found online, but here’s a brief video featuring the show’s two most common expletives.

  120. Et in Arcadia ego seven years ago.

    Loglan and Lojban are of course truly omnipredicative, and have an optional premarker of the predicate, ga in the first, cu /ʃu/ in the second. This is required to separate consecutive predicate words which would otherwise create a noun-noun (verb-verb) compound. Thus in Lojban le zdani blanu is a bare NP, literally ‘that-which-is house-blue’, roughly speaking something which is blue in the way that a house is (whatever that means), but le zdani cu blanu is a sentence, literally ‘that-which-is-a house is-blue.

  121. January First-of-May says:

    Thus in Lojban le zdani blanu is a bare NP, literally ‘that-which-is house-blue’, roughly speaking something which is blue in the way that a house is (whatever that means)

    Now I wonder what the Lojban for “school bus yellow” is… though, come to think of it, does Lojban even have a word for “school bus”? It has to be a pretty parochial concept.

  122. John Cowan says:

    Sure: ckule karce pelxu. Karce is etymologically car + Mandarin 車 chē ‘wheeled vehicle’, but its semantics is in between: it means ‘motorized wheeled vehicle (car, bus, truck, van, etc.)’ Etymologically related words are carce ‘cart (unmotorized)’ and marce ‘vehicle in general’.

    However, it’s also possible that ckule karce could mean something like ‘motorized vehicle used as a mobile school’, as the semantics of compounding in Lojban is deliberately not precisely defined. One could make a neoclassical-style (stump) compound kulkarce (the rules for doing this are precise) and define it to mean precisely ‘school bus’, if this seems useful enough. You could also add more parts to the compound to try to denote ‘school bus’ specifically, but each additional word-joint brings in more possible semantic ambiguity. “The price of infinite precision is infinite verbosity.”

    Lojban is left-grouping and modifier-head, so ckule karce pelxu is ‘school-bus yellow’, as opposed to ckule karce bo pelxu or ckule ke karce pelxu, both meaning ‘school bus-yellow’, whatever that may be. You can think of bo as a hyphen and ke as a left parenthesis. Lojban letters are IPA, except c for /ʃ/, j for /ʒ/, y for /ə/, for /h/ (used only intervocalically); h q w are not used.

  123. David Marjanović says:

    and marce ‘vehicle in general’.

    😮
    As if extracted from *karce-marce “a car or something”!

  124. In fact marce owes its /m/ to Russian mashin and Arabic markaba. The etymologies file attributes the /c/ in carce to a Spanish form char-, but I can’t find any such word; French char would seem more probable, but French words were not used in etymologies.

  125. French words were not used in etymologies.

    How very odd! Why not?

  126. David Eddyshaw says:

    Sheer spite. It’s because everyone knows that French is the logical language. Everyone French, anyhow.

  127. John Cowan says:

    The six numerically largest languages of the world were used in accordance with a figure-of-merit algorithm that weighted the languages according to their numbers of speakers and the candidate words according to the match between their phonemes and the source phonemes when Lojbanized, deaffricated, and stripped of affixes. At the time, those were Mandarin, English, Spanish, Hindi/Urdu, Russian, and Arabic. Adding more languages, given the tight phonotactic constraints (all primitive words are CVCCV or CCVCV), just created too many ties between different possibilities. For the most part, Chinese and English dominate anyway, though dakfu ‘knife’ owes only its /d/ to Chinese and its /f/ to English.

    Some words, the metric prefixes, scientific units, a few religious terms, technical terms of Lojban grammar, and words for significant world cultures/languages, were exempt from the algorithm; these all end in -o (e.g. brazo ‘Brazilian’). O is the rarest vowel in Lojban because it does not exist in Chinese (written o, wo, uo is really /wɘ/ and could just as well be we, ue). Near the end, we got a little whimsical: the word for ‘Antarctican’ is dzipo, a portmanteau for (ca)dzu cipni ‘walking bird’.

    In some cases there were also human errors: thus the source forms in the six languages (in the above order) for ‘broken’ are po, brokn, tut, rot, sloman, kasar, for which spotu was the best candidate (figure of merit 55.66), but by mistake spofu (figure of merit only 37.00) became the actual Lojban word. These errors were never fixed, because the whole idea of the algorithm is to create words that are faintly reminiscent of their meanings to a good many people, but not so reminiscent that their actual semantics is swallowed up by their semantics in the source language(s).

  128. marie-lucie says:

    David E: It’s because everyone knows that French is the logical language. Everyone French, anyhow.

    Count me out, I am a linguist.

  129. David Marjanović says:

    written o, wo, uo is really /wɘ/

    Sure, but phonetically it’s [(w)ɔɑ̯] or thereabouts, making o quite reasonable.

Speak Your Mind

*