Gandhari and Other Long-lost Languages.

John Preston writes about people trying to decipher ancient languages; he starts with a nice anecdote:

One day in 1994 Richard Salomon, professor of Asian Languages and Literature at the University of Washington, received a small package in the mail. Inside were a number of blurry black and white photographs and an accompanying letter from the British Library asking if they might be of any interest.

Salomon started looking at the photos – first idly, and then with growing disbelief. “I could see pretty quickly they were the real deal.” The photos showed various inscriptions that were written on a series of scrolls – scrolls of bark that the British Library had been given by an anonymous donor, who in turn, had bought them from an anonymous buyer based somewhere in Pakistan.

The inscriptions Salomon saw were written in Gandhari, a middle Indo-Aryan language closely related to Sanskrit that was in use from the third century BC to the fourth century AD. It was hardly surprising that the British Library had come straight to him. Salomon was one of the few, the very few, people in the world who could read Gandhari – or at least read some of it. “I knew the basic grammar, but there were an awful lot of words that I didn’t know.”

Up until then Salomon had been working on the only known example of a Gandhari manuscript ever discovered – it’s also reckoned to be the oldest surviving example of an Indian text. This discovery, though, changed everything.

A few days later, Salomon flew to London to have a look for himself.

Because they’re written on bark, Gandhari manuscripts are much more fragile than anything on paper, or vellum. A French archaeologist who discovered some in the 1830s found that they literally crumbled to dust as soon as he touched them. Rolled up, the manuscripts Salomon saw resembled enormous cigars. Unrolled, some of them were more than 8ft long. As he gazed at them, something strange happened. “Literally, it was as if my life flashed before my eyes.” Straight away, Salomon realised that there was so much new material here he was going to be spending the rest of his career working on it. Sure enough, 20 years on, he’s still hard at it. “I know a lot more now than I did, but there’s still a long way to go.”

He goes on to discuss Tangut (see this LH post), Sogdian (“The other day for instance I came across the Sogdian word for liver,” says Sims-Williams. “That was quite a big moment”), and Rongorongo (see this LH post), inter alia; I liked this bit on Linear A:

Trying to unpick a lost language is also very solitary work. “Yeah, it’s not exactly something you can have out with the family over dinner,” says Younger. “But that’s fine for me – I love working on puzzles and I love detective work. For instance, I couldn’t sleep last night so I got up at 2am and started working on Linear A.” Younger receives a steady stream of carefully thought out theories from fellow specialists.

But he also has to contend with a regular influx of deeply eccentric suggestions.

“Oh yes, you get a lot of nuts,” he says cheerfully. “I’m a real magnet for mad people. At the moment for instance I’ve got one woman telling me that Linear A is Japanese, someone saying it’s Celtic and someone else saying it’s proto-Persian. But like the story about the troop of monkeys eventually typing up Shakespeare, they do occasionally send in quite plausible suggestions.”

And Richard Salomon, the guy mentioned at the start of the piece, was actually quoted in this forlorn 2003 LH post. Thanks, Trevor!


  1. “For instance, I couldn’t sleep last night so I got up at 2am and started working on Linear A.”

    Ha, just yesterday I posted some observations on my own blog that being so keen on the study of a few languages has led to unhealthy habits and obsessions in my case. I am glad to get confirmation so straightaway that it’s not just me.

  2. ə de vivre says:

    Take Etruscan, for instance. Etruscan was the main spoken and written language of the Etruscan civilisation that held sway in Italy from 700BC to 500AD. Today, we only understand a few hundred words of it. As for counting in Etruscan, if you can make it to six you’re a shoo-in for a Nobel Prize. And then there’s the Elamite language, spoken in Iran almost 5,000 years ago. This has had scholars banging their heads against library walls for generations – partly because it seems to bear no resemblance to any other script.

    I’m almost impressed by the density of inaccuracies in this paragraph.

  3. January First-of-May says:

    I’m almost impressed by the density of inaccuracies in this paragraph.

    Here’s the ones I could pick up myself:

    – there were probably no Etruscans still left by 500 AD
    – even if there were, they hadn’t “held sway” in Italy for many centuries before that
    – …if ever, depending on how big a part of Italy we’re talking about
    – the big problem in Etruscan numbers is figuring out 4 versus 6, and IIRC even that one had been solved recently; certainly it was nothing deserving a Nobel Prize
    – also, pretty sure that the amount of understood Etruscan words is in double, not triple digits
    – the Elamite language had been written in at least three or four different scripts over its long history…
    – …including a cuneiform-style script from about 2500 years ago that shows up in the huge Behistun trilingual, and had thus been relatively easily deciphered
    – also scripts vary so much that “no resemblance to any other script” doesn’t really mean much
    – in any case, script resemblance is really only a factor in deciphering when the same signs are used for the same sounds over multiple languages (which is admittedly fairly common)

    Anything I missed, or stated incorrectly?

  4. Per Alfred Nobel’s will, the Nobel prize in Etruscan studies specifically excludes anything having to do with numerals.

  5. ə de vivre says:

    Wikipedia says that the most recent Etruscan text is from ~50 AD. My guess is that the extra zero’s a typo, and the author didn’t think too much about the difference between a language and a civilization. And there’s a general confusion between languages and scripts.

    The author got really confused about Elamite, which is probably because it’s really confusing. Before the Elamite language proper was deciphered, “Elamite” was a geographical term with no linguistic connotations. Proto-Elamite was the one written ~5,000 years ago. That one hasn’t been deciphered (although, like Linear A, we know what many of the signs mean, we just don’t know anything about the language they mean it in), but it bears a very strong resemblance to (and shares a few signs with) its contemporary Uruk-era cuneiform. There’s a good chance Proto-Elamite wasn’t used to write (the ancestor of) the Elamite language, since the region called “Elam” was, linguistically, much less Elamite than the highlands to the east. Linear Elamite is the one that bears no resemblance to any other scripts, but it came about 1500 years after proto-Elamite. It was probably used to write Elamite proper, but the corpus is so small it’s hard to say much about it.

  6. “Although it looks like Chinese to the untutored eye, it bears no resemblance to it at all” is pretty self-contradictory. A resemblance to Chinese is just what it does have.

    I was glad to learn of the Rohonc Codex, however, because unlike our host I am interested in mysterious manuscripts like this.

  7. January First-of-May says:

    Linear Elamite is the one that bears no resemblance to any other scripts, but it came about 1500 years after proto-Elamite. It was probably used to write Elamite proper, but the corpus is so small it’s hard to say much about it.

    I wanted to mention Linear Elamite with something along the lines of “the main reason it hadn’t been deciphered is because we have so little of it”. I’m not sure why I forgot; maybe because I wasn’t sure whether the claim referred to Linear Elamite or Proto-Elamite, and generally didn’t recall much about either.

    Certainly it shouldn’t really matter whether the script looks like anything else known or not (excluding, again, the case of shared signs) – especially if the language is already known (or suspected).
    The problems happen mostly when there isn’t enough text in it to figure it out – which is very much the case for Linear Elamite*, and to a lesser extent also for Etruscan and Linear A.

    (Of course, IIRC, the corpuses of some of the Sabellic languages make Linear Elamite look large [aren’t a few of them like three words?], but there we’re lucky to have shared signs – they’re basically written in Greek – and they’re probably Indo-European, which helps as well!)

    *) though new long inscriptions were apparently found last year

  8. David Marjanović says:

    The most tantalizing case is North Picene.

  9. I’m almost impressed by the density of inaccuracies in this paragraph.

    Forget it, Jake, it’s journalism.

  10. I went through all the Sabellic languages on WP, and none of them are supported by as few as three words, though some have only three inscriptions or less. As for North Picene, it’s a presumptive isolate represented by one decent-sized inscription and three fragments, and whatever it is, it is not Sabellian.

  11. It’s 6th century BC Voynich manuscript.

    Totally invented language to torment future linguists

  12. ə de vivre says:

    Languages like Etruscan are the most frustrating: there’s enough text to give a tantalizing idea of what the language was like, but not enough to really flesh it out.

  13. David Marjanović says:

    North Picene is worse: there’s enough to show it’s neither IE nor Etruscan nor anything else recognizable, and that’s all we can say about it…

  14. Marja Erwin says:

    I don’t know the debate, but Václav Blažek (2008) concludes that North Picene is Indo-European and close to Italic.

  15. Read Blažek’s article. He says baleśtenag is borrowed from Greek ballistarius “artilleryman”.

    Now, I notice that the text also has a similar word krúviśtenag which suggests that *enag is a suffix similar to English suffix -nik (of Slavic origin)

    In fact, there is an even better Slavic form of this suffix matching -enag almost perfectly – niak, eg, Russian gorniak “miner”.

    baleśtenag then can be rendered in English as ballistnik – operator of ballista, ancient Greek missile weapon.

  16. krúviśtenag is clearly from well attested Indo-European word meaning “blood, flesh wound” and particularly close to Slavic form *krū.

    The word as a whole could reconstructed as hypothetic Slavic word kruvishnik which likely meant an object of blood feud. Less likely, it could simply denote a very cruel and bloody person/deity.

    I think I’ll stop here lest I venture far into Russian Etruscans territory

  17. Trond Engen says:

    I was just staring at the NP text thinking that there’s no repetition or recurring patterns to work with.

  18. David Marjanović says:

    Ah, I missed Blažek’s paper (which isn’t cited in the WP article or its talk page). Could someone post a link?

  19. Casual Browser says:

    What I really miss is all of that work the emperor Claudius did on the Etruscan history and language, none of which has survived. Would have cheerfully swapped the work of some of the drearier Church fathers as a survival from classical times

  20. @SFReader: It seems to me that in English, the suffix –nik is only productive in terms of mockery.

  21. David Marjanović says:

    Thanks for the link, the paper looks solid.

  22. Blažek has some sensible ideas, but I don’t see how he can declare NP an IE language with so little lexical material identified.

    The supposed Greek loans are possible, but without context they can be just as easily chance resemblances. That λ in the supposed Greek words is reflected sometimes as l, sometimes as r, does not help.

  23. Trond Engen says:

    I’m not ready to accept all his conclusions, but there are many valid observations.

    I should say that after squinting for a while, although not seeing any patterns, I too recognized iśperion as Greek. I was a little torn, because I also wanted to see gaareśtadeś as a Germanic perfect meaning “erected”, or possibly “carved”. And maybe iśairon as Celtic or Germanic “iron”.

  24. A recent paper (Romain Garnier and Benoît Sagot, A shared substrate between Greek and Italic, Indogermanische Forschungen 122(1):29–60, 2017) suggests that an unknown centum IE substrate is the source of many of the obscure etymologies of Greek (often assigned to “Pelasgian” or “Pre-Greek”), and of Proto-Italic as well. I don’t know enough to judge its quality. They do go into historical phonological detail (good), but the vocabulary which represents this supposed substrate doesn’t show any semantic coherence (not so good).

  25. Stephen C. Carlson says:

    A three-page conference paper version of Garnier & Sagot is found here:

  26. David Marjanović says:

    I’ve downloaded the paper itself and… intend to read it at some point.

  27. The authors’ thesis is interesting, but if the conference paper version is anything to go by they really need to learn to express themselves more clearly: For example, they offer five phonological features which allegedly define this Indo-European substrate language (AKA “Crotonian”) as distinct from both Latin and Greek, but the first of these (“voiceless reflexes of PIE voiced aspirated stops”) puzzled me, as Greek devoiced Indo-European aspirate stops: on the basis of their examples I realized that what they meant was that Crotonian had turned Indo-European voiced aspirate stops into stops which are both voiceless *and* unaspirated, a treatment which is indeed equally alien to Latin and Greek.

  28. Stephen C. Carlson says:

    I don’t know enough to judge its quality. They do go into historical phonological detail (good), but the vocabulary which represents this supposed substrate doesn’t show any semantic coherence (not so good).
    My impression is the same. I’m not a fan of their “Crotonian,” however: wasn’t Croton a Greek colony?

  29. David Marjanović says:

    I’ve read the paper. It looks very promising.

    wasn’t Croton a Greek colony?

    Yes, but apparently it wasn’t founded in a previously uninhabited place. If its name had a Greek etymology, surely that would be known?

  30. Trond Engen says:

    Is this where we look into Messapic?

  31. David Marjanović says:

    Messapic lacks the devoicing of initial aspirates: bilia “daughter”.

  32. Trond Engen says:

    Yes, if that’s the correct meaning. It seems to me that if there’s a common substratum in Greek and Italic, a thorough look at the Iapyges would be hard to avoid.

  33. David Marjanović says:

    …So there could be a phonological substrate in Greek!

Speak Your Mind