Gandhari and Other Long-lost Languages.

John Preston writes about people trying to decipher ancient languages; he starts with a nice anecdote:

One day in 1994 Richard Salomon, professor of Asian Languages and Literature at the University of Washington, received a small package in the mail. Inside were a number of blurry black and white photographs and an accompanying letter from the British Library asking if they might be of any interest.

Salomon started looking at the photos – first idly, and then with growing disbelief. “I could see pretty quickly they were the real deal.” The photos showed various inscriptions that were written on a series of scrolls – scrolls of bark that the British Library had been given by an anonymous donor, who in turn, had bought them from an anonymous buyer based somewhere in Pakistan.

The inscriptions Salomon saw were written in Gandhari, a middle Indo-Aryan language closely related to Sanskrit that was in use from the third century BC to the fourth century AD. It was hardly surprising that the British Library had come straight to him. Salomon was one of the few, the very few, people in the world who could read Gandhari – or at least read some of it. “I knew the basic grammar, but there were an awful lot of words that I didn’t know.”

Up until then Salomon had been working on the only known example of a Gandhari manuscript ever discovered – it’s also reckoned to be the oldest surviving example of an Indian text. This discovery, though, changed everything.

A few days later, Salomon flew to London to have a look for himself.

Because they’re written on bark, Gandhari manuscripts are much more fragile than anything on paper, or vellum. A French archaeologist who discovered some in the 1830s found that they literally crumbled to dust as soon as he touched them. Rolled up, the manuscripts Salomon saw resembled enormous cigars. Unrolled, some of them were more than 8ft long. As he gazed at them, something strange happened. “Literally, it was as if my life flashed before my eyes.” Straight away, Salomon realised that there was so much new material here he was going to be spending the rest of his career working on it. Sure enough, 20 years on, he’s still hard at it. “I know a lot more now than I did, but there’s still a long way to go.”

He goes on to discuss Tangut (see this LH post), Sogdian (“The other day for instance I came across the Sogdian word for liver,” says Sims-Williams. “That was quite a big moment”), and Rongorongo (see this LH post), inter alia; I liked this bit on Linear A:

Trying to unpick a lost language is also very solitary work. “Yeah, it’s not exactly something you can have out with the family over dinner,” says Younger. “But that’s fine for me – I love working on puzzles and I love detective work. For instance, I couldn’t sleep last night so I got up at 2am and started working on Linear A.” Younger receives a steady stream of carefully thought out theories from fellow specialists.

But he also has to contend with a regular influx of deeply eccentric suggestions.

“Oh yes, you get a lot of nuts,” he says cheerfully. “I’m a real magnet for mad people. At the moment for instance I’ve got one woman telling me that Linear A is Japanese, someone saying it’s Celtic and someone else saying it’s proto-Persian. But like the story about the troop of monkeys eventually typing up Shakespeare, they do occasionally send in quite plausible suggestions.”

And Richard Salomon, the guy mentioned at the start of the piece, was actually quoted in this forlorn 2003 LH post. Thanks, Trevor!


  1. “For instance, I couldn’t sleep last night so I got up at 2am and started working on Linear A.”

    Ha, just yesterday I posted some observations on my own blog that being so keen on the study of a few languages has led to unhealthy habits and obsessions in my case. I am glad to get confirmation so straightaway that it’s not just me.

  2. ə de vivre says:

    Take Etruscan, for instance. Etruscan was the main spoken and written language of the Etruscan civilisation that held sway in Italy from 700BC to 500AD. Today, we only understand a few hundred words of it. As for counting in Etruscan, if you can make it to six you’re a shoo-in for a Nobel Prize. And then there’s the Elamite language, spoken in Iran almost 5,000 years ago. This has had scholars banging their heads against library walls for generations – partly because it seems to bear no resemblance to any other script.

    I’m almost impressed by the density of inaccuracies in this paragraph.

  3. January First-of-May says:

    I’m almost impressed by the density of inaccuracies in this paragraph.

    Here’s the ones I could pick up myself:

    – there were probably no Etruscans still left by 500 AD
    – even if there were, they hadn’t “held sway” in Italy for many centuries before that
    – …if ever, depending on how big a part of Italy we’re talking about
    – the big problem in Etruscan numbers is figuring out 4 versus 6, and IIRC even that one had been solved recently; certainly it was nothing deserving a Nobel Prize
    – also, pretty sure that the amount of understood Etruscan words is in double, not triple digits
    – the Elamite language had been written in at least three or four different scripts over its long history…
    – …including a cuneiform-style script from about 2500 years ago that shows up in the huge Behistun trilingual, and had thus been relatively easily deciphered
    – also scripts vary so much that “no resemblance to any other script” doesn’t really mean much
    – in any case, script resemblance is really only a factor in deciphering when the same signs are used for the same sounds over multiple languages (which is admittedly fairly common)

    Anything I missed, or stated incorrectly?

  4. Per Alfred Nobel’s will, the Nobel prize in Etruscan studies specifically excludes anything having to do with numerals.

  5. ə de vivre says:

    Wikipedia says that the most recent Etruscan text is from ~50 AD. My guess is that the extra zero’s a typo, and the author didn’t think too much about the difference between a language and a civilization. And there’s a general confusion between languages and scripts.

    The author got really confused about Elamite, which is probably because it’s really confusing. Before the Elamite language proper was deciphered, “Elamite” was a geographical term with no linguistic connotations. Proto-Elamite was the one written ~5,000 years ago. That one hasn’t been deciphered (although, like Linear A, we know what many of the signs mean, we just don’t know anything about the language they mean it in), but it bears a very strong resemblance to (and shares a few signs with) its contemporary Uruk-era cuneiform. There’s a good chance Proto-Elamite wasn’t used to write (the ancestor of) the Elamite language, since the region called “Elam” was, linguistically, much less Elamite than the highlands to the east. Linear Elamite is the one that bears no resemblance to any other scripts, but it came about 1500 years after proto-Elamite. It was probably used to write Elamite proper, but the corpus is so small it’s hard to say much about it.

  6. “Although it looks like Chinese to the untutored eye, it bears no resemblance to it at all” is pretty self-contradictory. A resemblance to Chinese is just what it does have.

    I was glad to learn of the Rohonc Codex, however, because unlike our host I am interested in mysterious manuscripts like this.

  7. January First-of-May says:

    Linear Elamite is the one that bears no resemblance to any other scripts, but it came about 1500 years after proto-Elamite. It was probably used to write Elamite proper, but the corpus is so small it’s hard to say much about it.

    I wanted to mention Linear Elamite with something along the lines of “the main reason it hadn’t been deciphered is because we have so little of it”. I’m not sure why I forgot; maybe because I wasn’t sure whether the claim referred to Linear Elamite or Proto-Elamite, and generally didn’t recall much about either.

    Certainly it shouldn’t really matter whether the script looks like anything else known or not (excluding, again, the case of shared signs) – especially if the language is already known (or suspected).
    The problems happen mostly when there isn’t enough text in it to figure it out – which is very much the case for Linear Elamite*, and to a lesser extent also for Etruscan and Linear A.

    (Of course, IIRC, the corpuses of some of the Sabellic languages make Linear Elamite look large [aren’t a few of them like three words?], but there we’re lucky to have shared signs – they’re basically written in Greek – and they’re probably Indo-European, which helps as well!)

    *) though new long inscriptions were apparently found last year

  8. David Marjanović says:

    The most tantalizing case is North Picene.

  9. I’m almost impressed by the density of inaccuracies in this paragraph.

    Forget it, Jake, it’s journalism.

  10. I went through all the Sabellic languages on WP, and none of them are supported by as few as three words, though some have only three inscriptions or less. As for North Picene, it’s a presumptive isolate represented by one decent-sized inscription and three fragments, and whatever it is, it is not Sabellian.

  11. It’s 6th century BC Voynich manuscript.

    Totally invented language to torment future linguists

  12. ə de vivre says:

    Languages like Etruscan are the most frustrating: there’s enough text to give a tantalizing idea of what the language was like, but not enough to really flesh it out.

  13. David Marjanović says:

    North Picene is worse: there’s enough to show it’s neither IE nor Etruscan nor anything else recognizable, and that’s all we can say about it…

  14. Marja Erwin says:

    I don’t know the debate, but Václav Blažek (2008) concludes that North Picene is Indo-European and close to Italic.

  15. Read Blažek’s article. He says baleśtenag is borrowed from Greek ballistarius “artilleryman”.

    Now, I notice that the text also has a similar word krúviśtenag which suggests that *enag is a suffix similar to English suffix -nik (of Slavic origin)

    In fact, there is an even better Slavic form of this suffix matching -enag almost perfectly – niak, eg, Russian gorniak “miner”.

    baleśtenag then can be rendered in English as ballistnik – operator of ballista, ancient Greek missile weapon.

  16. krúviśtenag is clearly from well attested Indo-European word meaning “blood, flesh wound” and particularly close to Slavic form *krū.

    The word as a whole could reconstructed as hypothetic Slavic word kruvishnik which likely meant an object of blood feud. Less likely, it could simply denote a very cruel and bloody person/deity.

    I think I’ll stop here lest I venture far into Russian Etruscans territory

  17. Trond Engen says:

    I was just staring at the NP text thinking that there’s no repetition or recurring patterns to work with.

  18. David Marjanović says:

    Ah, I missed Blažek’s paper (which isn’t cited in the WP article or its talk page). Could someone post a link?

  19. Casual Browser says:

    What I really miss is all of that work the emperor Claudius did on the Etruscan history and language, none of which has survived. Would have cheerfully swapped the work of some of the drearier Church fathers as a survival from classical times

  20. @SFReader: It seems to me that in English, the suffix –nik is only productive in terms of mockery.

  21. David Marjanović says:

    Thanks for the link, the paper looks solid.

  22. Blažek has some sensible ideas, but I don’t see how he can declare NP an IE language with so little lexical material identified.

    The supposed Greek loans are possible, but without context they can be just as easily chance resemblances. That λ in the supposed Greek words is reflected sometimes as l, sometimes as r, does not help.

  23. Trond Engen says:

    I’m not ready to accept all his conclusions, but there are many valid observations.

    I should say that after squinting for a while, although not seeing any patterns, I too recognized iśperion as Greek. I was a little torn, because I also wanted to see gaareśtadeś as a Germanic perfect meaning “erected”, or possibly “carved”. And maybe iśairon as Celtic or Germanic “iron”.

  24. A recent paper (Romain Garnier and Benoît Sagot, A shared substrate between Greek and Italic, Indogermanische Forschungen 122(1):29–60, 2017) suggests that an unknown centum IE substrate is the source of many of the obscure etymologies of Greek (often assigned to “Pelasgian” or “Pre-Greek”), and of Proto-Italic as well. I don’t know enough to judge its quality. They do go into historical phonological detail (good), but the vocabulary which represents this supposed substrate doesn’t show any semantic coherence (not so good).

  25. Stephen C. Carlson says:

    A three-page conference paper version of Garnier & Sagot is found here:

  26. David Marjanović says:

    I’ve downloaded the paper itself and… intend to read it at some point.

  27. The authors’ thesis is interesting, but if the conference paper version is anything to go by they really need to learn to express themselves more clearly: For example, they offer five phonological features which allegedly define this Indo-European substrate language (AKA “Crotonian”) as distinct from both Latin and Greek, but the first of these (“voiceless reflexes of PIE voiced aspirated stops”) puzzled me, as Greek devoiced Indo-European aspirate stops: on the basis of their examples I realized that what they meant was that Crotonian had turned Indo-European voiced aspirate stops into stops which are both voiceless *and* unaspirated, a treatment which is indeed equally alien to Latin and Greek.

  28. Stephen C. Carlson says:

    I don’t know enough to judge its quality. They do go into historical phonological detail (good), but the vocabulary which represents this supposed substrate doesn’t show any semantic coherence (not so good).
    My impression is the same. I’m not a fan of their “Crotonian,” however: wasn’t Croton a Greek colony?

  29. David Marjanović says:

    I’ve read the paper. It looks very promising.

    wasn’t Croton a Greek colony?

    Yes, but apparently it wasn’t founded in a previously uninhabited place. If its name had a Greek etymology, surely that would be known?

  30. Trond Engen says:

    Is this where we look into Messapic?

  31. David Marjanović says:

    Messapic lacks the devoicing of initial aspirates: bilia “daughter”.

  32. Trond Engen says:

    Yes, if that’s the correct meaning. It seems to me that if there’s a common substratum in Greek and Italic, a thorough look at the Iapyges would be hard to avoid.

  33. David Marjanović says:

    …So there could be a phonological substrate in Greek!

  34. David Marjanović says:

    These two papers from 2009 and 2003 propose a kentum substrate and/or a pre-satəm one (or several of each) in Balto-Slavic… and antedate Trond’s suggestion of using ante-X for non-X substrates in X.

  35. Trond Engen says:

    Not my suggestion (though I wish, so thanks!). ’twas Lars.

    And thanks for the papers. It makes for exciting holiday reading, even if (or especially since) I’m preconseptually sceptical of a claim of finding more than one similar substrate.

  36. David Marjanović says:

    Oops. I’ve always been bad at distinguishing people.

    sceptical of a claim of finding more than one similar substrate

    The papers mostly present the material, grouped by type of weirdness. The claims about substrates are kept very general.

    I should have pointed out the revelation that there’s a French dialect with [st] for ch-.

  37. By the way, in addition to the single Gāndhārī manuscript known before 1994 (found in Xinjiang a century earlier) and the 29 scrolls purchased by the British Museum in 1994, we now have a large number of further fragmentary manuscripts from various locations in Afghanistan and Pakistan, making Gāndhārī a relatively well-studied language.

  38. marie-lucie says:

    David M: the revelation that there’s a French dialect with [st] for ch-.

    I haven’t had time to read the papers. Can you be more precise?

  39. Savoyard [ster] ‘dear’, [stã] ‘field’, [sto] ‘hot’, etc., in the 2003 paper, p. 54.
    His sources are Duraffour (1969), Glossaire des patois francoprovençaux, and Martin & Tuaillon (1971, 1974, 1978), Atlas linguistique et ethnographique du Jura et des Alpes du nord (francoprovençal central).

  40. Marie-Lucie: this “revelation” refers to some Franco-provençal varieties having (for instance) /ster/ as a reflex of Latin CARUM, corresponding to French “cher”: both Central Old French (i.e. non-Norman and non-Picard dialects of Old French) and “Old Franco-provençal” had /tʃ/ as a reflex of Latin /k/ before /a/: in Modern French this /tʃ/ became the fricative /ʃ/, while in the Franco-Provençal variety discussed it shifted to /st/, presumably via a stage /ts/, which in some Franco-provençal is still preserved as such : other Franco-provençal varieties have (inter alia) /θ/ or /s/ as their reflex of this phoneme.

    David: two things:

    1-The fact that this /st/ reflex of the Indo-European voiceless palatal stop is only found in Baltic makes me suspect it indeed is a phonological adaptation, in Baltic, of an affricate reflex of the voiceless palatal stop (possibly from pre-proto-Slavic) rather than an indication that /st/ was the regular reflex of this phoneme in some extinct Indo-European dialect: if such a dialect had existed why are there no /st/ reflexes of the voiceless palatal stop in Slavic? Word-initially, Slavic preserves Indo-European */st/, and thus at any stage of its history Slavic speakers would have had no difficulty reproducing an initial /st/ cluster in borrowed words. And mark you, such words needn’t have been borrowed directly from this hypothesized Indo-European dialect: they could have entered Slavic via Baltic.

    2-Why on earth does Andersen not have Holzer (1989) in his bibliography? I’m no Baltic or Slavic scholar (Understatement of the century, that!), but even I know of his work on the topic of an otherwise unknown Indo-European substratum in Baltic and Slavic. Is there a grudge/rivalry between those two I should know about?

  41. marie-lucie says:

    Merci Etienne, I was totally unaware of this dialectal peculiarity.

  42. Arpitan is certainly full of weird dialectal stuff, since it has never undergone any period of standardization that would tend to force convergence.

  43. David Marjanović says:

    if such a dialect had existed why are there no /st/ reflexes of the voiceless palatal stop in Slavic?

    Because that dialect was spoken too far west to have any contact with Slavic? Just guessing, but I’m not aware of identified Baltic loanwords in Proto-Slavic, so I don’t expect that any /st/ words would have been necessarily passed on from Baltic to Slavic.

    Is there a grudge/rivalry between those two I should know about?

    No idea, but I’ve often been struck by publications in historical linguistics or synchronic phonetics/phonology not citing or using relevant information (much more often than I’m used to from my field). I think this is simply because this information is scattered over a lot of tiny journals and obscure, expensive books, and hope the Internet will fix that.

    Another paper by the same author, but from 1969, complains about “a tradition in American linguistics” to ascribe word-final devoicing to German (north of the White Sausage Equator), when in fact it’s syllable-final fortition. Well? Nobody “in America” seems to have read that paper, which is in English, in the 49 years since then; references to “word-final devoicing in German” remain all over the place. Word-final devoicing is found in Dutch and Russian and a lot of other languages, but in German the phenomenon this is meant to refer to (where it exists) is syllable-final (with different syllabifications of plosive + resonant clusters in different accents), and it happens in most or all of the area with Inderior German Gonsonant Weagening, where all obstruents (or nearly so) are voiceless all the time.

    I remember complaining here, probably 3 years ago, about a paper that tried to explain the High German consonant shift but got its outcome – just the observed facts – wronger than Wikipedia in a pretty glaring aspect… and that paper has been cited by later works.

    since it has never undergone any period of standardization that would tend to force convergence

    Also, alpine valleys are great places for innovations to stay isolated.

  44. Trond Engen says:

    Etienne: 2-Why on earth does Andersen not have Holzer (1989) in his bibliography?

    And, if he’s concerned with Balto-Slavic accent, why no Jay Jasanoff or Miguel Carrasquer Vidal?

    Disclaimer: I have still not read beyond the opening paragraphs of the papers.

  45. /st/ in Baltic: I’ve wondered before if stirna might not be simply unrelated to *ḱerh₂- and instead cognate to German Stirn. They look quite well derivable from a quick-and-dirty LPIE *stirn-; or, in light of three sonorants in a row, more probably a common northern European substrate source. (I also have a possible phonological explanation I sketch out in the linked blog post, but it seems at least equally speculative.)

    Maybe also worth remarking: Finnic *tuhat : tuhante- does not require an a-grade formation **tūšamt-, it can also be from pre-resonant vocalization *tūšm̥t-, or a phonetic intermediate stage *tūšəmt-. Compare e.g. *härkä ‘bull’ < *šärkä ← *žr̥gas > *žirgas > Lith. žirgas ‘steed’, Latv. zirgs ‘horse’.

  46. David Marjanović says:

    Thanks for reminding me of Weise’s law!

    Semantically, I suppose Stirn f. “forehead”, older Stirne, is close enough, if a meaning like “pair of antlers” can be postulated as intermediate. Maybe.

    On the Germanic side, Wiktionary mentions OHG stirna and OE steorn for PGmc. *stirnō and then mentions Greek stérnon “chest, breastbone, heart”. I don’t understand where the *i comes from; there don’t seem to be any triggers for Proto-Germanic *e…i > *i…i as seen right behind the forehead in Hirn, OHG hirni.

    On the Baltic side, there’s the claim: “More recently, it has been suggested that stirna might come from Proto-Indo-European *ser- (“red, pink”) in the reduced grade *sr̥-no-, causing t epenthesis in Baltic.” Then comes a list of Slavic cognates, followed by a reference, so I don’t know if the reference applies just to the list or to the claim as well. The reference sounds authoritative: “Karulis, Konstantīns. 1992, 2001. Latviešu etimoloģijas vārdnīca. Rīga: AVOTS.”

  47. Word-final devoicing is found in Dutch

    It turns out to be more complicated. Dutch has syllable-final fortition just like German, which takes the form of devoicing. But there is a further constraint that obstruent clusters have to share the same voicing, and it turns out that the head morpheme wins. So zagde ‘said’ devoices to [zakde] and then assimilates to [zakte], whereas zakdoek ‘handkerchief’, lit. ‘pocket-cloth’ assimilates in the reverse direction to [zagduk].

  48. David Marjanović says:

    the head morpheme wins



    Surely [x ~ χ] and not [k]?

Speak Your Mind