INDUS SCRIPT SQUABBLE.

May 4, 2009 by languagehat 90 Comments

I imagine most LH readers are at least vaguely familiar with the mysterious script associated with the Indus Valley Civilization that flourished over four millennia ago; the great question is whether it is a writing system representing a spoken language (the question of what that language might have been is another issue) or simply a collection of symbols. The argument for a writing system has always boiled down to the fact that the civilization was widespread, advanced, and in contact with other civilizations that used writing; how could they not have had it themselves? The contrary argument has been that almost all the inscriptions are extremely short and don’t show any clear evidence of being linguistic in nature.

Now a group of authors (Rajesh P. N. Rao, Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari, and Iravatham Mahadevan) have published a paper in Science claiming that “the script’s conditional entropy is closer to those of natural languages than various types of nonlinguistic systems.” This has stirred up a fair amount of controversy. While I’m not competent to deal with the information-theoretic arguments deployed, I’m willing to take the word of Cosma Shalizi and Mark Liberman that the paper doesn’t prove what it claims to prove (Fernando Pereira says, “Once again, Science falls for a glib magic formula that purports to answer a question about language”); one of the authors of the paper responds to criticism by Steve Farmer and Michael Witzel but does not address the problem Shalizi and Liberman point out. In a MetaFilter thread on the subject, the evidently knowledgeable Sova (Сова being the Russian word for ‘owl’) says “the idea that there is a language beneath these symbols, or even that the symbols have the kind of order to encode any information – linguistic or otherwise – is up for debate,” and that’s still my default position, but I’m wondering if any LH readers have something to say about it.

Comments

dimrub says

May 4, 2009 at 2:24 pm

I used to be interested in the subject once to the extent that I tried to read the book by Asko Parpola on the subject. His case sounded convincing to a lame me, but I heard there was lots of criticism of his methods coming from the linguists working in this field.
MattF says

May 4, 2009 at 2:37 pm

From just looking (briefly!) at the various links, the big hole in the Science paper is that it’s entirely model-free. Entropy is only a measure of how much ‘room’ there is in state-space, and doesn’t say anything about what’s filling that room.
So, lots of people with various analytical tools, e.g., MATLAB, or R, or Python, on their computers proceeded to hack together non-linguistic models that had the same entropic behavior as the symbol sequences examined in the Science paper.
I’d say there’s a lot of weak argumentation on both sides of the issue, but maybe that’s the point.
VS says

May 4, 2009 at 2:38 pm

I read a good account of the controversy in a recent issue of Russian “Newsweek,” and the graphs looked pretty convincing for a layman; it was also clear that the subject is, well, controversial. If it’s true that the authors of the original paper made a claim to the effect that the language in question was related to Tamil, it’s way off field.
SnowLeopard says

May 4, 2009 at 3:14 pm

Based on the one-page Science Express article under discussion, it seems that you and Sova are correct that the question is very much up for debate. I don’t have an information-science background, but the authors seemed loosely speaking to be comparing the amount of variation in the sequence of Indus script symbols to other sorts of sequences — if there’s either too much or not enough rigidity or predictability (= “conditional entropy”?) in how the “tokens of language” follow one another, it’s unlikely to be a language, goes the argument. They compared the Indus script to Sumerian writing, Old Tamil, Sanskrit, English words, English alphabet-letters (presumably taken from a text sample), DNA and protein sequences, the programming language Fortran, and a few other artificial but non-linguistic sequences like boundary stones and deity markers that they did not identify with much detail. They found the conditional entropy of the Indus script to be close to the linguistic samples used, especially Old Tamil and Sumerian, and far from the non-linguistic samples used. I didn’t see any discussion of the statistical significance of these findings, or
any discussion of alternative intepretations.
Anyway, while this may literally be “quantitative evidence for the existence of linguistic structure in the Indus script”, as the authors claim, that does not mean the evidence is determinative or even particularly persuasive, in no small part because of the limited range of possibilities the authors were considering. The analysis they made here would not persuade me, for example, that the Indus script wasn’t a particularly complex and ingenious form of musical notation, since music too arguably has an intermediate level of “conditional entropy” as I broadly misinterpret their term. Or maybe they reflect some other phenomenon, such as using the flights of birds to forecast the future, or an eager arithmetician trying to calculate pi, logarithms, or cube roots. The only confident conclusion I would take home from this study on its current terms is that the Indus script may in fact be human-made, and we already had reason to believe that.
Stuart says

May 4, 2009 at 4:48 pm

Judging from comments about this on an Indian cinema forum, the decision to close comments at the Log on this topic becuase “it tends to generate more heat than light” was right on the money. It is a very hot-button issue for many Indian nationalists, and so I wouldn’t be surprised if your mention of the Rao paper drew the attention of the RSS crowd.
Jim says

May 4, 2009 at 5:53 pm

“It is a very hot-button issue for many Indian nationalists, ”
Ee-yup. They have lots of hot button issues around this hot button issue. It’s a hot button issue for the Tamil nationalists. It gets mixed up in the controversy over Aryan Invasion Theory, which is a hot button issue for the Hindutva people, who call it colonialist and racist, and the Tamil nationalists who praise it because they read it to mean that the northerners are all colonialists and racists.
Stuart says

May 4, 2009 at 6:20 pm

the Tamil nationalists who praise it because they read it to mean that the northerners are all colonialists and racists.
A viewpoint that’s not too hard to understand, given the linguistic imperialism of the North. I had a friend from Poona dismiss Kerala’s 90%+ literacy rate as irrelevant because it was “only in Malayali, not Hindi or English”. So I think you’re right, it will be interesting to see the Dravidian nationalists (present Dravidians excluded) and the Hindutva nationalists go at each other over this. 🙂
Doc Rock says

May 4, 2009 at 6:30 pm

Without regard to the current paper and its thrust, I’d just like to point out the striking similarity of the Indus Script characters and those of the Rongo-Rongo [ˈɾoŋoˈɾoŋo] boards of Easter Island.
http://www.geocities.com/script_rongorongo/frgm16.gif
Also, I believe twenty or so years ago there was a Russian group which claimed to have deciphered the Indus Script using computers and cryptographic algorithms.
http://anthroglobe.info/docs/rjabchikovs_protoindian2_061022.html
David Marjanović says

May 4, 2009 at 6:36 pm

Is Parpola the one who offered a fairly detailed interpretation of the Indus script as a more or less logographic script for a Dravidian language? Because, not having really followed this issue, I don’t see why that isn’t simply accepted as the most parsimonious interpretation. That the inscriptions are all short fits the interpretation that they’re mostly name tags for goods, fitting the fact that some are on seals.
Nationalists coming in in 3… 2… 1…
language hat says

May 4, 2009 at 6:38 pm

(N.b.: I will ruthlessly delete any comments with a nationalist agenda, so anyone reading this who wants to promote such an agenda, don’t bother.)
David Marjanović says

May 4, 2009 at 6:38 pm

I’d just like to point out the striking similarity of the Indus Script characters and those of the Rongo-Rongo [ˈɾoŋoˈɾoŋo] boards of Easter Island.

Which is far less drastic when you look at it, and other scripts, more closely.
Stuart says

May 4, 2009 at 6:39 pm

Hat, that was why I made my first post to this thread – I wanted to warn you of the possibility/probability of such posts given the subject matter. I hope that was OK.
John Cowan says

May 4, 2009 at 7:42 pm

The most parsimonious explanation of the Indus script is that it’s non-linguistic signs, as the most parsimonious explanation for the Phaistos Disc is that it’s a game board.
language hat says

May 4, 2009 at 8:28 pm

I hope that was OK.
Oh, absolutely.
marie-lucie says

May 4, 2009 at 9:36 pm

I read Parpola’s book a few years ago and was impressed by the extremely detailed and painstaking work that went into it, which takes into account all sorts of possibly relevant factors, cultural as well as linguistic. I know that there have been criticisms and that some people do not consider the script to be true writing, for instance some said that the characters are just symbols for individual gods (but why so many gods should be mentioned on seals used to identify merchandise, without other description, seems to need explaining). I am not in a position to really evaluate Parpola’s work (for instance, I don’t know any Dravidian), but it doesn’t look like the people who criticize him have done anything like a fraction of the amount of work he and his team did.
Jim says

May 4, 2009 at 11:00 pm

The central issue is which ethnicity gets to claim the Indus Valley civilization and thereby claim to be more truly Indian than anyone else. the tone of the litle I have read of it reminds me of Afrocnetric “scholarship” in the US – long on belief and a show of scholarliness, short on solid evidence. Maybe only foreigners like Parpola and Witzel are disinterested enough to make untainted arguments.
As for parsimony, that’s good but, but it’s not conclsuive, any more than arresting someone holding a bloody knife standing over a corpse is going to be a slam dunk case. The simple fact is that it has been a liong time since anything other than an Indo-aryan language was spoken in the region and the case is very cold by now. Witzel goes into a lot of detail and seems e to end up unconvinced even by his own hypothesis.
As for disinterestedness, Parpola is not off the suspects list either until I see where he has renounced that suggestion that Uralic and Dravidian are related.
John Emerson says

May 5, 2009 at 12:24 am

But of course they’re related. Ultimately, all languages are dialects of Dravidian.
dearieme says

May 5, 2009 at 8:33 am

Have these methods been applied to the Pictish symbol stones? INWN?
language hat says

May 5, 2009 at 9:11 am

INWN?
Trey says

May 5, 2009 at 9:40 am

For those whose eyes glaze over when the math comes out, here’s the lowdown on the conditional entropy models used. Basically the question is how well does one symbol predict the next symbol? In English, “q” is not followed by “u” 100% of the time, but it is a good predictor. “t” is often followed by “h” or “r” or a vowel but rarely “q” or “b”.
If the conditional probability is very low, then previous symbols are very good predictors of following symbols, which is not very language-like. If “a” means “b” is 99% likely to follow and “b” means “c” is 99% likely to follow, and “c” means “d” is 99% likely to follow, then there is no room for originality or creativity, and thus it is hard to encode a message.
If the conditional probability is very high, then the stream of symbols is more and more random, which means there are no patterns, which is also not language-like.
The conditional probability analysis here has failed to show that the symbol system is very much unlike a language, which the authors seem to have taken as evidence that it must be a language. This is a logical fallacy that the critics are jumping on; with the more mathematically inclined creating symbols systems that are not languages but nonetheless have the same statistical properties as language.
I’ve done something similar before for SpecGram (http://specgram.com/CL.3/02.letters.html). This text meets the statistical specification of English at the trigram level (3-letter combinations, rather than bigrams, or 2-letter combinations), so if you use any of the statistical language identifiers out there on the web, it will usually most strongly identify as English, though clearly it is not:

Anot Lanywassufte:
Carapes the ditl isch prentele whic che fiene Unincip-ikedfuls Que pland trial laing expror, no the thent acards, wal of of Eng Evis, forigh Worics on ousunt heard In youle not to linet med, mants of sen gic spers of at nam at mands wouremay.
“For efillyin froccut werepty to oreings; thicialy, sualich.” Goverphose blit.
Wes coved sell the wrikerearthis wicad whistivem the of cledull-inal an of ve froullestione onfestian el emerster 200 youst-suced.
Thumay Cationsitens
Ares Aftence Ge
Psionfustuatenterences

The interesting thing to me is that if you didn’t know it was fake text, it feels vaguely like it could be a Germanic language or a not-so-Romance Romance language (like the way Romanian has been heavily influenced by contact with Slavic languages). Since Germanic + a-bit-of-Romance is a fair characterization of English, you can see why the stats are fooled.
language hat says

May 5, 2009 at 10:21 am

Thanks very much, Trey, I actually understood that! (And to think I was once a math major…)
bulbul says

May 5, 2009 at 10:26 am

Thanks indeed, Trey.
And to think I was once a math major…
Thank you for sharing your dark and shameful secret with us.
Jim says

May 5, 2009 at 10:31 am

“then previous symbols are very good predictors of following symbols, which is not very language-like. ”
Trey, I am mising somethng. The way you explain this, it sounds like that method treats all symbols in a system as equally predictive or non-predictive. That state of affairs is also not very language-like.
Trey says

May 5, 2009 at 10:42 am

Jim, not all symbols need to be equally predictive. The conditional entropy is an overall measure of the predictability of the the whole system. In English, “q” is much more predictive of the next symbol than “e”. The conditional entropy formulas roll all that up into a single number, which represents the amount of uncertainty in all prediction you would try to make over a relatively long run of text.
As a simple example, imagine a system in which every letter in the Latin alphabet is followed by the next alphabetical letter, except “z”, which is followed by a completely random letter. Overall, this system is highly predictable, so the conditional probability would be very low, even though “z” by itself is maximally unpredictable.
Hope that helps..
Jim says

May 5, 2009 at 1:45 pm

OK. That makes sense now.
AJP Crown says

May 5, 2009 at 2:08 pm

That’s great, Trey, thanks. Even I understand it.
I’m pretty sure there’s a connection between being good at math and being good at linguistics (which is not to say that every linguist is good at math).
dearieme says

May 5, 2009 at 2:28 pm

“INWN” = if not why not. EP.
Trey says

May 5, 2009 at 2:59 pm

AJP, there’s also a connection between being good at linguistics and being an awesome killing machine, I think.. I found out today that Jason Bourne eventually became a linguistics professor.
Stuart says

May 5, 2009 at 4:24 pm

I’m pretty sure there’s a connection between being good at math and being good at linguistics
That probably also highlights the difference between good at linguistics and being good at languages. My maths aptitude fits comfortably inside a full matchbox.
bulbul says

May 5, 2009 at 4:40 pm

Trey,
it feels vaguely like it could be a Germanic language
I’d go with Frisian.
And as for Jason Bourne, you mean The Bourne Legacy, right? I liked that one and not only for the Georgetown connection, though I had to raise the pet peeve alert. A Hungarian guy named Stepan Spalko, seriously? That’s almost as bad as Czechoslovak guy named Victor Laszlo…
David Marjanović says

May 5, 2009 at 5:48 pm

The most parsimonious explanation of the Indus script is that it’s non-linguistic signs, as the most parsimonious explanation for the Phaistos Disc is that it’s a game board.

Just to make sure – you’re joking, right?

As for parsimony, that’s good but, but it’s not conclsuive, any more than arresting someone holding a bloody knife standing over a corpse is going to be a slam dunk case.

Of course. It just means if you disagree with it, you have to put something better on the table – a hypothesis that is either even more parsimonious given the same data, or explains more data, or both.

The conditional probability analysis here has failed to show that the symbol system is very much unlike a language, which the authors seem to have taken as evidence that it must be a language. This is a logical fallacy that the critics are jumping on; with the more mathematically inclined creating symbols systems that are not languages but nonetheless have the same statistical properties as language.

OK, but, in the real world, what could such a system be?
marie-lucie says

May 5, 2009 at 6:40 pm

(being good at linguistics and at …)
I think that the various parts of linguistics require different aptitudes and tastes. Formal syntax (a la Chomsky) attracts nerdy types, acoustic phonetics is for engineers, comparative-historical linguistics is for those who love puzzles and mysteries, and so on: there is something in linguistics for just about everyone. An aptitude for linguistics is not necessarily linked to an aptitude for learning languages in the practical sense, but they often go together.
there’s also a connection between being good at linguistics and being an awesome killing machine, I think
This statement really hit me hard when I first read it – WHAT? so I had to read through Wiki’s plot summaries of the Jason Bourne novels, which I had never even heard of: it’s amazing what some authors imagine a linguistics professor’s life is like, or how one ends up becoming one. I was breathless just reading the plots of the first three novels.
… creating symbols systems that are not languages but nonetheless have the same statistical properties as language. –
OK, but, in the real world, what could such a system be?
Would music qualify?
Mark Liberman says

May 6, 2009 at 7:31 am

David Marjanović: “This is a logical fallacy that the critics are jumping on; with the more mathematically inclined creating symbols systems that are not languages but nonetheless have the same statistical properties as language.”
OK, but, in the real world, what could such a system be?
The statistical properties under discussion are not at all “the same as language”, but in fact are limited to a unigram perplexity of about 69, and a bigram perplexity of about 27 (with higher-order models having no further effect).
This means that if you choose a symbol at random (from among the 400 or so that exist, depending on whose count you accept), your uncertainty about what that symbol will be is the same as if there were 69 equally-likely alternatives. But if you know what the previous symbol was, your uncertainty is reduced to about what it would be if there were 27 equally-likely alternatives.
Is this a strong argument that the symbol-sequences are some sort of script? Not really, because exactly the same sorts of statistical properties arise in many non-linguistic situations. For example, in a bird census for a region with 400 native species, your uncertainty about the next bird you see will be much lower (because all birds are not equally likely) — a distribution with a perplexity of about 69 is plausible. And if you know what bird you’ve just seen, your uncertainty about the next bird to come along with be considerable reduced, since different species are more or less likely in different places and times (and some birds come in flocks).
Obviously the Indus symbols are not records of a bird census. But they might be symbols arising from many social or cultural processes that share the (very simple) statistical properties shown to apply in this case.
And the particular parameters found are not all that script-like in detail. The unigram perplexity of a logographic system is likely to be much higher than 69, and the bigram perplexity much higher than 27. There could be a syllabary with numbers like that, but it would be a surprise to find that knowing the two previous symbols gives you no more predictive ability than knowing one previous symbol does.
Mark Liberman says

May 6, 2009 at 7:43 am

I should add that the n-gram perplexity numbers just cited come from Rao et al.’s Supporting Online Material. The graph in their paper proper deals only with the way that the bigram entropy changes as you look as increasing fractions of the vocabulary, sorted by relative frequency. What I (and Cosma and Richard) observed, and showed by simulation, is that exactly the same sort of curve will arise from any process with a vaguely appropriate frequency-distribution (e.g. power-law or harmonic or log-normal), even one that is memoryless (i.e. where each symbol is drawn at random with no dependence on the previous one).
And as is all too well known, many phenomena (from city sizes to animal-species populations to internet links) have distributions in that general class.
Again, this doesn’t show that the Indus inscriptions are not samples of a script. I personally have no opinions on that subject, because I don’t know enough about it. But this does show that the arguments made by Rao et al. are remarkably weak ones.
AJP Crown says

May 6, 2009 at 8:56 am

m-l: comparative-historical linguistics is for those who love puzzles and mysteries
Hmm. Now who do we know who likes reading mystery stories?
marie-lucie says

May 6, 2009 at 9:02 am

AJP, Who do you think?
Etienne says

May 6, 2009 at 11:34 am

Two observations, if I may:
1-The impact of identity politics in India does seem to have had an indirect impact on Indus script research: as far as I know, scholars seem to have always assumed that the underlying language is either Dravidian or Indo-Aryan.
Has any serious reseach ever been undertaken on the Indus script which started from the assumption that the language is neither Indo-Aryan nor Dravidian? Better yet, that it was typologically unlike either?
Considering the fact that several language isolates are attested further West (Elamite, Hurrian, Sumerian…), it is quite possible that the Indus valley language too is unrelated to any known language of India (or is related to a non-Indian language, perhaps one of the above).
2-It seems to me that statistical analysis of the Indus script makes one huge assumption: that the script was used to represent a single language. But if this assumption is false, and the Indus Valley script represented two or more typologically dissimilar languages (not unlikely, considering the large geographical area involved: again, compare to the Ancient Near East and its cuneiform writing, which was used for a number of unrelated languages), what impact would this have on a statistical analysis of symbol distribution?
My guess is that different regularities in different languages would cancel each other out, making the distribution of the symbols look too random to be the representation of a language, but that’s just a guess: if Trey or some other hatter with a good knowledge of statistics could make a comment on this I think I would not be the only interested reader.
Jim says

May 6, 2009 at 11:47 am

I think that simply talking about “symbols” is ignoring someething basic that throws a flaw into the analysis. There is a difference between alphabets and syllabaries and logogrphic systems that affects the randomness of their component symbols’ occurence.
The symbols in a logographic system like Chinese are independent units of meaning where letters in an alphabet are not. That means that the logographic symebols are freer to occur with others where the letters of an alphabet are limited by simple mechanical phonetic constraints among other things, as logograms are not.
Another problem is that when you compare logograms with letters, you are not comparing liek with like. Letters (really the sounds they represent) combine to represent units of meaning, morphemes, for the language they represent, whereas logograms individually represent the morphemes directly. So really letters are equivalent to, say, the strokes of a Chinese logogram. In effect you are comparing symbols in one systems which are in effect components of the symbols in then other systm. So the issue is what are you counting when you are calculating the randonmess of occurrence?
John Cowan says

May 6, 2009 at 12:04 pm

Jim: Logograms, which should really be called morpho-syllabograms, are indeed freer than alphabetic letters, but that does not make them “independent units of meaning”. Instead, they represent morphemes, which show just as much patterning, albeit at a higher level, as phonemes do.
An argument against the Indus signs being a script for either an Indic or a Dravidian language is that the four known independently created writing systems (Sumerian, Egyptian, Chinese, Mayan) are all morpho-syllabic, and all reflect languages with approximately one morpheme per syllable and vice versa. (Writing systems created by stimulus diffusion, on the other hand, tend to be pure syllabaries.)
Jim says

May 6, 2009 at 1:07 pm

John,
Logogram is probably and old term.
“Instead, they represent morphemes, which show just as much patterning, albeit at a higher level, as phonemes do.”
Yes, but that patterning is going to show a wide range of variance depending on the language. If you were to invent a system of morpho-syllabograms for, say, Navajo – what sadist would do this? – there would a lot less randonmness than there would be for Chinese for example, just due to the structure of Navajo verbs.
Jim says

May 6, 2009 at 1:21 pm

“Has any serious reseach ever been undertaken on the Indus script which started from the assumption that the language is neither Indo-Aryan nor Dravidian? Better yet, that it was typologically unlike either?”
Not that I have heard of. Witzel doesn’t think the lanague was necessarily Dravidian or Indo-Aryan, although eh points out that Gujarat and Sindh obviously were Dravidian speaking early. (obvious to him; I can’t evaluate anything like that) and ended up by proposing a language he called Para-Munda, but didn’t analyze the script. He evaluated lots of possible sources for non-IE words in the Vedas, since they arose in that same generla area, paid some attention to the ethnonym “Meluhha” and evaluated the chances of it referring to “Brahui” or the ‘burush’ part of Burushaski, and Para-Munda was what he came up with.
Etienne says

May 6, 2009 at 3:10 pm

Jim: Yes, I should have remembered “Para-Munda” (I read a good deal of Witzel’s work long ago). It does seem to confirm my claim about the “Indocentricity” of researchers (even foreign ones!): after all, why has Munda been proposed as a comparandum for the Indus script, instead of (say) Sumerian or Kartvelian? The latter two are geographically about as far from the Indus valley as the Munda family, after all, and since the language spread of Indo-Aryan took place starting in the North-West there is nothing illegitimate in exploring the possibility of a connection between the Indus valley language(s) and languages/language families spoken further West.
Incidentally, if the Indus script indeed represents more than one language, this might go a long way in explaning the difficulties with deciphering it: for a long time Celtiberian suffered from the same problem, as it was written in the same script used to write Iberian, and attempts at deciphering the entire corpus failed until it was realized that the same script represented two distinct languages.
Jim says

May 6, 2009 at 5:17 pm

Sometimes a litle ignorance can be a big help. No one knew to confuse Mixe-Zoque with Mayan when they were deciphering that script because fortunately the corpus was overwhelmingly Mayan. When they finally figured out that it was a syllabary, and therefore phonetic, they tried reading them in various Mayan langauges and finally setled on Cholan. Fine. Then they noticed that they still couldn’t read the Olmec inscriptions. Then someone fiddled around and sounded out the Olmec inscriptions and guessed they looked like a Mixe-Zoque language, with a lapse of 2,000 and shazam! – it turned out to fit. It beats spinning in place with a bunch of irreducibly contradictory data.
Indocentricity. Witzel looked at the BMAC as a sister culture/civilzation and considered North Caucasian or something like that. Part of that was to tease out another possible source of non-IE material in Sanskrit and Avestan. But he did finally dismiss it as a contender for the Indus area.
Indocentricity in this may not be so hard to defend. Kartvelian may be the same distance as the modern-day Munda languages are, but people aren’t crows and it’s a lot easier to get to the Indus from Orissa or the other areas where the Munda languages are than from the southern Caucusus.
As for Sumerian, he actually did propose it, in a backdoor kind of way. He basically says that Sumerian may be related to the Munda lanaguges because of simialrities in derivationla processes, but he didn’t adduce any actual cognate forms – cognate between attested Munda languages and Sumerian.
The other proposal that pokes its head up now and then is that it’s an Indic language. But Witzel’s white (I guess) so he’d hardly going to get away with proposing that.
marie-lucie says

May 6, 2009 at 5:29 pm

Jim: he didn’t adduce any actual cognate forms
You mean, “potentially cognate forms”. You can’t jump to the conclusion that resemblant forms are cognate until you have a) good reason to think that the languages are related and b) sufficient examples of correspondences. A few resemblant forms here and there are not enough. (I am sure you know that but many people are confused about the true meaning of “cognates” as opposed to “faux amis” – some of which are cognate but the two definitions are different).
Jim says

May 6, 2009 at 5:53 pm

You mean, “potentially cognate forms”.
Yeah. And he didn’t even go that far. What he noticed were basically typological similarities, though granted, they were very anomalous similarities for that part of the world.
AJP Crown says

May 7, 2009 at 2:51 am

m-l: AJP, Who do you think?
A complete mystery.
Etienne says

May 7, 2009 at 2:03 pm

Jim: granted that “people aren’t crows”, still, both Indo-Aryan and Dravidian apppear to have entered the Indian subcontinent from the North-West, so there is nothing implausible, A PRIORI, about the Indus valley language(s) also having relatives to its West (i.e. outside the Indian subcontinent).
Furthermore, Munda is clearly part of Austroasiatic, a language family whose center of gravity lies much further to the East: I agree with Colin Masica, who wrote that given this it seems a bit of a stretch to have Munda or Para-Munda as far West as the Indus Valley.
Finally: has anyone worked on the Indus valley script, starting from the assumption that the underlying language is either Burushaski or a relative thereof? Geographically that language would seem the likeliest candidate.
Jim says

May 7, 2009 at 3:18 pm

“Jim: granted that “people aren’t crows”, still, both Indo-Aryan and Dravidian apppear to have entered the Indian subcontinent from the North-West, so there is nothing implausible, A PRIORI, about the Indus valley language(s) also having relatives to its West (i.e. outside the Indian subcontinent). ”
OMG! The evil Aryan Invasion Theory! I wasn’t saying that it’s implausible, just radioactive. Hindutva types in California have attacked Witzel himself.
I think it makes a lot more sense to say that IE and Dravidian entered India from the north because there’s a tail of populations along that route for those languages, whereas there isn’t anything like that for Kartvelian or North Caucasian across the Iranian Plateau, just miles and miles of Indo-Europeans to push through.
“Furthermore, Munda is clearly part of Austroasiatic, a language family whose center of gravity lies much further to the East: I agree with Colin Masica, who wrote that given this it seems a bit of a stretch to have Munda or Para-Munda as far West as the Indus Valley.”
Well by that logic English can’t very well be IE or Navajo be Athapaskan. In any case the center of gravity of Austrasiatic not really the issue, the center of Munda is. Austroasiatic languages have been in the area for plenty long enough for people to move a lot further west than the Indus. The center of diffusion for Munda may have been the Ganges Delta – if IE could move all the way from the Indus to the Ganges Delta, surely Munda could have done the same in reverse at an earleir time. And a Munda language, Korku, is spoken in northern Maharastra, maybe only a few hundred miles from the Indus – maybe even at about the same distance as Burushaski. In fact there are Harappan sites in Gujarat, which puts Korku even closer.
It seems to me that there are really only four candidates for the language behind the script, something IE, something Dravidian, something related to the Munda languages, something related to Burushaski. That’s not an impossibly large range of possiblities to work.
John Cowan says

May 8, 2009 at 1:52 am

there are really only four candidates for the language behind the script
Five. There is also “no language at all”.
Jim says

May 8, 2009 at 10:36 am

That one too! They culd be like gang graffiti – readable, but not a language.
iakon says

May 8, 2009 at 1:06 pm

This is the first time I’ve read that Dravidian languages came from the northwest (probably because I haven’t read about them per se — yes, yes, fools rush in). I recently read (in an aside from the main theme of whatever I was reading) that the Tamil have a legend that they came from the south: from the long tail of the subcontinent (down to Diego Garcia) that sank beneath the rising Ocean after the last ice age.
iakon says

May 8, 2009 at 1:33 pm

‘…whatever I was reading’: geneticist Stephen Oppenheimer’s Eden in the East.
David Marjanović says

May 8, 2009 at 7:24 pm

Very good point that the Indus script might represent more than one language.
Also, if we’re merely going by geographic vicinity, what about Elamite? (The evidence for a close relationship between Elamite and Dravidian is much weaker than used to be thought, BTW.)
And haven’t the people who work on Vedic Sanskrit identified lots of substratum words that are not Dravidian and apparently not Munda either? Like amba “mother”?
But, more importantly, has anyone ever actually disproven Parpola’s interpretation? Does that interpretation lead to any internal contradictions or to any other obvious nonsense? How probable is it that the “star”-“fish” connection is a coincidence?

from the long tail of the subcontinent (down to Diego Garcia) that sank beneath the rising Ocean after the last ice age.

Wrong. This area is not continental crust and was never land. Not even the way Iceland is. The area between the mainland and Sri Lanka falls dry every ice age, but that’s it, then comes the continental slope which goes 4000 m downward.
Sure, island-hopping, the way Polynesia was settled, would of course be possible. But how do you reach Diego García in the first place? The same way (or straight across the ocean, which is not an option).
marie-lucie says

May 8, 2009 at 7:26 pm

iakon: Dravidian languages came from the northwest
I suppose that this refers to the Elamo-Dravidian hypothesis (Elam having been on the coast of present-day Iran), which does not seem to be accepted nowadays (but since Elamite is only poorly known, it is difficult to know what to make of the hypothesis), and also to the existence of the Dravidian outlier Brahui in the area generally associated with the Harappan culture. Was the geneticist agreeing with this hypothesis or not? (assuming that one can equate genes and language at least in this case, something that would need very careful consideration).
David Marjanović says

May 8, 2009 at 7:38 pm

The symbols in a logographic system like Chinese are independent units of meaning where letters in an alphabet are not. That means that the logographic symebols are freer to occur with others where the letters of an alphabet are limited by simple mechanical phonetic constraints among other things, as logograms are not.

Freer, yes, but by no means completely free, because Chinese has a strict word order.
(Never mind, of course, that Chinese is probably less logographic than what little I remember of Parpola’s interpretation of the Indus script.)

the four known independently created writing systems (Sumerian, Egyptian, Chinese, Mayan) are all morpho-syllabic, and all reflect languages with approximately one morpheme per syllable and vice versa.

Is that so with Egyptian? I thought Egyptian had three-consonant, two-consonant and one-consonant signs? Were the three-consonant ones all monosyllabic, or were they much rarer than I used to think?
Regarding the affiliations of Sumerian… Bomhard appears to have found lots of Nostratic words in it, while the morphology (polysynthetic verbs and all) looks Dené-Caucasian, and the personal pronouns also look vaguely Dené-Caucasian… vaguely… In any case, it’s clear that Sumerian hasn’t got any reasonably close known relatives.
David Marjanović says

May 8, 2009 at 7:47 pm

Oops, blockquote fail.

I suppose that this refers to the Elamo-Dravidian hypothesis

Not necessarily. There appears to be some evidence for Dravidian all the way to Uzbekistan or something… but as far as I remember it all comes from the Rgveda or something… I have to read up on that again, and probably won’t have time this weekend. 🙁
Anyway, here‘s the paper that argues against the Elamo-Dravidian hypothesis.
iakon says

May 8, 2009 at 8:36 pm

I don’t think the genetecist refered to Elamite, m-l. As I said, his remark about the Tamil belief that they came from the south was merely an aside, although he does detail linguistic (as well as other kinds) of evidence supporting his theme, which was the migration of peoples from Sundaland (now part of island Southeast Asia) after the Ice Age.
Thanks, DM; if I still had my National Geographic Atlas I would have checked the bathymetry. It looks like the Sinhalese-speakers have perhaps been pushing the Tamil-speakers north for quite some time.
marie-lucie says

May 8, 2009 at 10:30 pm

DM, thank you for the link to Starostin’s paper. I don’t know enough about the various languages to make an informed evaluation, but I will print the article and keep if for future reference.
iakon, thank you for putting me on the track of the Oppenheimer book, which sounds very interesting. I looked up Sundaland on Wiki and found a review of the book. I have no problem accepting that the rise of the oceans through the melting of the glaciers resulted in more than the separation of Northern Asia from Northern America, and probably sent the inhabitants of rapidly sinking islands on boats back to continents with higher ground, but I am eager to learn more details of what the hypotheses are based on. For example, for the spread of Indo-European and Austronesian the reviewer does not seem to reaiize that the migration of a people and the spread of a language are not necessarily the same, and that there are ways to classify languages and to determine the relative ages of their branchings. But it sounds intriguing enough that I will try to get hold of it.
arun says

May 8, 2009 at 10:55 pm

The “lost continent” in Tamil belief is Kumari Kandam, which is postulated to have sunk beneath the Indian Ocean, with the current Tamils and Dravidians being remnants thereof. It was a popular hypothesis in the 19th and early 20th centuries, because it supported (by parallel analogy) the Aryan Invasion Theory. Incredibly, it was favored by pretty much everyone — British imperialists, because it supported that Indians were not of the same descent as Europeans; Dravidian nationalists, who claimed Kumari Kandam to be the original India and non-Dravidians to be aggressors; by non-Dravidians, who claimed the exact opposite; and also by geologists and other scientists, who found it VERY convenient to explain common fossils of flightless birds found in both India and Madagascar but not in mainland Africa.
The whole Kumari Kandam thing, like Atlantis, has been quietly kicked out after plate tectonics was accepted. And thus we lurch forward towards progress.
iakon says

May 8, 2009 at 11:51 pm

arun: The plate tectonics you refer to explains the breakup of Gonwanaland and the presence of flightless birds and their fossils in many places. You may also like to Google Socotra to see extremely alien plants apparently found also on Madagascar.
I’m glad Languagehat doesn’t mind digression — this one is pretty extreme.
marie-lucie says

May 9, 2009 at 12:17 am

arun, I think there is a difference in credibility between the rise in sea levels making some lands into islands (eg separating the British isles from the mainland) and a whole continent or at least large island sinking without a trace to the bottom of the ocean (as in Atlantis and perhaps Kumari Kandam, of which I had never heard). Legends about the latter may just be reinterpretations of a dim memory of something more correctly interpreted as the former, especially if they have passed through more than one ethnic group. And as to Atlantis, one origin or at least contributing factor to the legend seems to have been the explosion of Santorini where the centre of the original volcanic island did collapse into the sea, leaving only a portion of the volcano’s rim unsubmerged.
Etienne says

May 9, 2009 at 2:14 am

Marie-Lucie: the theory that Dravidian spread into India from an entry point in the Northwest is due to the internal structure of Dravidian itself: the Northwesternmost Dravidian language, Brahui, spoken in Western Pakistan (whose existence certainly makes it likelier that the Indus valley civilization was at least partly Dravidian-speaking), seems to have been the first Dravidian language to branch off from Proto- Dravidian: its position seems rather analogous to that of Blackfoot within Algonquian.
I imagine that part of the reason the Elamo-Dravidian theory was so well-received (thanks for the paper, by the way, David) was because Elamite does seem to be geographically where we’d expect a relative of Proto-Dravidian to be.
And conversely, the two southernmost Dravidian languages, Tamil and Malayalam, both belong to the same subgroup, and indeed only became different languages (i.e. from one another) a millenium or so ago.
Iakon: the very fact that Tamil in Sri Lanka is a dialect of Tamil, rather than a wholly separate Dravidian language, makes it very clear that Tamil must at some point have expanded into Sri Lanka, quite possibly *after* the expansion of Indo-Aryan (Sinhalese) there: I believe an indigenous group there (the Vedda) still preserves elements of a non-Indo-Aryan, non-Dravidian language.
Considering the state of relations between Tamil and Sinhalese speakers, though, I’m afraid scientific detachment on the part of local scholars on this issue will be impossible for quite some time, and either the next generation of scholars, or foreign scholars, will be the ones to reconstruct the processes (and chronology) whereby both languages came to dominate Sri Lanka.
marie-lucie says

May 9, 2009 at 2:39 am

Etienne, thank you for the clarification of Dravidian subclassification. I was not sure where Brahui was but on the wiki map of Dravidian Brahui looks like it is smack in the area of the former Harappan culture.
Adrian Morgan says

May 9, 2009 at 6:41 am

I admit that I just don’t understand Mark’s or Cosma’s analyses. What is the point of generating a counterexample by means of an abstract mathematical algorithm selected for the purpose of generating a counterexample? For an alternative hypothesis to the claim that the Indus script is linguistic, you need not just any old algorithm that creates similar patterns, but one that people from the Indus Valley civilisation could conceivably have used.
Why don’t Mark or Cosma give a concrete example of such an algorithm? A game, perhaps. Let’s say you play your favourite card game, and before shuffling afterwards, record the order of cards in the deck. The result will be neither completely random nor completely deterministic. Whether it will resemble linguistic data is another question.
I do think the argument ought to concentrate on concrete counterexamples rather than abstract mathematical ones.
language hat says

May 9, 2009 at 8:17 am

The “lost continent” in Tamil belief is Kumari Kandam, which is postulated to have sunk beneath the Indian Ocean, with the current Tamils and Dravidians being remnants thereof. It was a popular hypothesis in the 19th and early 20th centuries, because it supported (by parallel analogy) the Aryan Invasion Theory. Incredibly, it was favored by pretty much everyone — British imperialists, because it supported that Indians were not of the same descent as Europeans; Dravidian nationalists, who claimed Kumari Kandam to be the original India and non-Dravidians to be aggressors; by non-Dravidians, who claimed the exact opposite; and also by geologists and other scientists, who found it VERY convenient to explain common fossils of flightless birds found in both India and Madagascar but not in mainland Africa.
That’s one of the more interesting things I’ve learned recently. Thanks, arun, and this is why I enjoy digression!
Stuart says

May 9, 2009 at 8:28 am

the very fact that Tamil in Sri Lanka is a dialect of Tamil, rather than a wholly separate Dravidian language, makes it very clear that Tamil must at some point have expanded into Sri Lanka
In what sense is Sri Lankan Tamizh a dialect? I’m curious because Indian friends of mine who have Tamizh as their common tongue tend to refer to Sri Lankan Tamizh in reverential tones, speaking of its “purity”, and how much “better” it is than Indian Tamizh. One even said that she tends ro feel ashamed of her Tamizh when speaking it with Sri Lankans. They told me that this was because Sri Lankan Tamizh was older and nearer oroiginal, whatever that may mean. Also one “teach yourself” book I got said that the distinction between written and spoken Tamizh, which are often presented as practically separate entities in Indian Tamizh, is much less significant in Sri Lankan Tamizh. Given this I am interested in the use of the word dialect for Sri Lankan Tamizh.
SnowLeopard says

May 9, 2009 at 8:31 am

the four known independently created writing systems (Sumerian, Egyptian, Chinese, Mayan) are all morpho-syllabic, and all reflect languages with approximately one morpheme per syllable and vice versa
A minor point, perhaps, but I don’t think this is an accurate characterization of Egyptian, Mayan, or Sumerian writing or language. For examples, see “nfr” (beautiful), “balam” (jaguar), and “ninda” (bread, rations) respectively, all of which are written using one glyph/symbol in the languages to which they belong.
iakon says

May 9, 2009 at 10:35 am

Arun: I feel compelled to come back to inform you that there is no such thing as ‘progress’. Google The Whig Interpretation of History . The Whigs were businessmen, and business people today are pretty well the only ones who use the word ‘progess’, particularly when they say that people who are against, for instance, subdividing farmland for housing, are against ‘progress’. Please don’t take my rant to be against you. Its just that increase in knowledge can slow, stop and even reverse. Something to keep in mind when we’re living in the time that George Orwell warned us against.
And etienne: I’ve learned that it’s not a good idea to use words like ‘fact’ and ‘clear’, particularly with intensifiers. As Sergius, the oldest white man on Haida Gwaii, who had a grade eight education just like my father, and reads books only to ‘pass the time’ and help him fall asleep, says: ‘Its only opinion, isn’t it?’
And yes, that’s what this comment is.
language hat says

May 9, 2009 at 3:42 pm

In what sense is Sri Lankan Tamizh a dialect?
“Dialect” is not an insult, it refers to any distinct form of a language.
David Marjanović says

May 9, 2009 at 5:36 pm

And as to Atlantis, one origin or at least contributing factor to the legend seems to have been the explosion of Santorini where the centre of the original volcanic island did collapse into the sea, leaving only a portion of the volcano’s rim unsubmerged.

Yes, except the center was already under the sea before that explosion (due to a previous explosion, of course), and Plato’s text just doesn’t fit it. Probably the whole story is actually about Troy and merely got a bit obscured by being translated to Egyptian and back.

perhaps Kumari Kandam, of which I had never heard

Maybe you have under the name “Lemuria”. That’s the one the biogeographers used.

“Dialect” is not an insult, it refers to any distinct form of a language.

Except, usually, the more artificial ones of the standard languages of the word — Standard German for example, which was to some extent deliberately designed to be comprehensible across dialect boundaries.

I believe an indigenous group there (the Vedda) still preserves elements of a non-Indo-Aryan, non-Dravidian language.

Yes. I don’t know if any further research whatsoever has been done on this, though.

I’ve learned that it’s not a good idea to use words like ‘fact’ and ‘clear’

Well, of course one has to be careful, but “fact” has a definition, and sometimes it applies. We’re not postmodernists, you see 🙂
==========
Incidentally, I’m not aware of any fossils of flightless birds from India. Australia perhaps?
Stuart says

May 9, 2009 at 5:56 pm

“Dialect” is not an insult, it refers to any distinct form of a language.”
Yes, that I know well. There can be a difference between the definition of a word and the perception of its use, though, as evidenced by the brouhaha over Poser’s use of “linguist over at the Log.
I wasn’t suggesting that “dialect” was being used pejoratively, simply asking for clarification. The context in which it was used left me with the impression that Sri Lankan Tamizh was a newer offshoot of a standard language, whereas my friends had left me with the impression that it was the other way round. That’s all.
marie-lucie says

May 9, 2009 at 8:49 pm

Tamizh language or dialect:
A “dialect” is simply a variety (usually regional) of a language, and in 19th century Europe dialectology began with the serious study of regional (usually rural) varieties, which gave the word “dialect” a pejorative meaning in the public mind, for which the “untutored” dialects were less evolved, had less vocabulary, carried a less refined culture, etc than the standard language, usually the variety spoken by the educated upper classes in the capital city (the lower levels there often having their own, stigmatized dialect). In modern sociological terms, the standard language (the variety used by government, the press, the school system, etc) is the “prestige dialect”, others having more or less (usually less) prestige for various reasons.
Another factor documented by sociolinguists (and important for the historical study of a language) is that rural dialects are not recent innovations by ignorant people, away from the “good” standard language, but separate developments from a common ancestor; instead of the dialects having diverged from a stable standard, it is usually the speech of large cities, especially the capital, which has evolved the most rapidly, so that dialects may preserve archaic features which are no longer part of the standard language. This at first paradoxical fact results from the migration of people of all sorts of linguistic backgrounds towards the larger cities, where people become used to a variety of accents and some oddities of vocabulary and syntax, and have to adapt their own speech to those around them, so that eventually big city speech becomes a common denominator while the more remote areas preserve some formerly standard features, as well as their own specificities (and even some areas of the city, or some specific segments of its society, may develop and preserve their own “sociolect”). For instance, I have mentioned earlier the simplification of the Standard French vowel system, started earlier in some cases but definitely observable and still in progress within my lifetime.
For the Tamil/Tamizh situation, the bulk of the speakers, and the large cities, are on the Indian mainland, not the Sri Lankan fringe. This means that one expects the Sri Lankan Tamils to speak a more archaic variety of the language than their counterparts on the mainland, especially those in the large cities (considering strictly the linguistic evidence, this expectation would be compatible both with a relatively recent migration of Tamils to SL or with a Tamil origin on SL itself and later migration to the mainland, so it cannot be used to support or to deny either position). In this specific case it appears that some of the Sri Lankan features are the same as in the ancient Classical Tamil literary language , so that the rural Tamils of Sri Lanka sound to the mainlanders like the old poets and their speech is admired for it. I don’t know how far this identification is actually true, or perhaps on a par with the rumour that Appalachian English is the same as Elizabethan English, or Québec French as Renaissance French. This is not to deny the archaisms in these cases, but it is easy to exaggerate them.
Back to Europe, there is a short story by Kafka which plays on this factor of archaism away from the capital: I don’t remember the title but it is about a document received by a government (at some point in history), a rare request for help coming from a very remote part of the country and written in what seems to be such an archaic style that nobody believes that it has been written recently and that the danger described by the writers is actually imminent.
MMcM says

May 9, 2009 at 10:50 pm

I don’t remember the title
„Beim Bau der Chinesischen Mauer”. (Fifth paragraph from the end. Can only find snippets of Englishing; sorry.)
Stuart says

May 9, 2009 at 10:51 pm

Marie-lucie, thank you. That was yet another simple and lucid explanation. You understood the question I had asked and answered it thoroughly, once more using examples that were relevant and clear. the rural Tamils of Sri Lanka sound to the mainlanders like the old poets and their speech is admired for it seems to be very near the mark, and your reference to the potential similarity with the “Elizabethan Appalachian” helped, since I am familiar with that old rumour. Great job, thanks!
John Atkinson says

May 9, 2009 at 11:55 pm

Ettienne writes: “…the theory that Dravidian spread into India from an entry point in the Northwest is due to the internal structure of Dravidian itself: the Northwesternmost Dravidian language, Brahui, spoken in Western Pakistan (whose existence certainly makes it likelier that the Indus valley civilization was at least partly Dravidian-speaking), seems to have been the first Dravidian language to branch off from Proto- Dravidian”
I disagree with this. Brahui is a North Dravidian language, most closely related to Kurux and Malto in Bihar and Nepal (the far north-east of Dravidia). Krishnamurti (“The Dravidian Languages”, a Cambridge Green book, p 491) claims that “in terms of shared phonological and morphological innovations, it [Brahui] could not have been separated for more than a thousand years from Kurux-Malto”. And FWIW, the Brahui themselves apparently claim to have migrated to their present location from further east.
I think this view is pretty much generally accepted by Dravidianists these days. It means that the present location of Brahui tells us nothing whatever about the Indus valley language(s). The best evidence that it/they might be proto-Dravidian or something close to it lies in the apparent Dravidian influences in early Sanskrit.
However, going by the centre of diversity of present Dravidian languages, the Dravidian homeland would have been in or around Orissa and/or Madhya Pradesh. (Of course this sort of argument is suggestive at best.)
marie-lucie says

May 10, 2009 at 12:47 am

Thank you for locating the Kafka story, MMcM. I read this story so long ago that I didn’t remember that the remote dialect anecdote was only incidental to a longer story. I don’t speak German much but I can read it to a certain extent, and I find Kafka easy to read as he uses simple vocabulary and sentence structure.
Stuart, thank you for your nice comments. Of course I could be wrong!
JA: After locating Rahui on the map I thought that it was a little too convenient that the language happened to be spoken in the Harappa region millennia after the ancient culture had disappeared. As you say, the location of the present-day language tells us nothing about what the ancient inhabitants spoke.
arun says

May 10, 2009 at 3:40 pm

@iakon: You’re absolutely right about plate tectonics and Gondwana. Thanks for the pointer to Socotra.
@marie-lucie and David: Yes, Kumari Kandam and Lemuria are used interchangeably for the same hypothesized landmass. This is a modern conception of where it was supposed to be: http://en.wikipedia.org/wiki/File:Kumari_Kandam_map.png
@David: There are no extant flightless birds in India, but there used to be one which was a common ancestor of the elephant bird and the ostrich, and thereby related to other ratites. The spread of ratites across the different continents happened before Gondwana broke up, and thereafter their evolution diverged. In fact, the modern ostrich is held to have migrated to Africa via Arabia from India, which was bearing its ancestors and heading north from Madagascar. A good non-technical description of the process and evidence is in The Ancestor’s Tale by Richard Dawkins (p 235 onwards).
@Stuart: marie-lucie is absolutely right in her comment about the Tamil dialects. I would also add that Tamil/Tamizh/Thamizh specifically has a rather extreme case of diglossia, where the spoken version (“kodunthamizh”) is markedly different from the written form (“senthamizh”). The written form (of the language, not the script), has not changed perceptibly since the time of the Sangam literature (300 BCE to 300 CE).
On the other hand, not only have the various spoken versions diverged from the written form, the spoken version can have very perceptible changes in vocabulary and word inflection (like the choice of suffixes to make a verb a participle and so on) with the location or the speaker, leading to city-specific dialects or caste-specific dialects. Among the various spoken forms in India, the Madurai dialect is unofficially considered the most neutral, specifically the variant spoken by educated non-Brahmins. [Harold Schiffman, “Standardization or restandardization: The case for ‘Standard’ Spoken Tamil”. Language in Society 27 (1998), pp. 359–385].
How much the local spoken form changes from the written form depends on the size and population heterogeneity of the city. The Tamil spoken by most Sri Lankan Tamils (the ones who migrated around 100 BCE), and (to a lesser extent) the Indian Tamils of Sri Lanka (also known as the Hill Country Tamils, who were transplanted as tea plantation laborers by the British in 1820-1930), is much closer to the written form than the spoken variants, and therefore sound more cultured and respectable to Indian Tamil speakers, as marie-lucie pointed out. In Sri Lanka, the Jaffna dialect is held to be the unofficial standard for the spoken form of Tamil.
Stuart says

May 10, 2009 at 4:14 pm

@ arun:much closer to the written form than the spoken variants, and therefore sound more cultured and respectable to Indian Tamil speakers
Thanks for that confirmation, arun. Does the Sri Lankan form have fewer borrowings from other languages? When I watch Tamizh films or listen to my collection of A.R. Rahman’s greatest hits, the only words I understand (apart from obvious ones like kadhal) are imports from English or Hindi. If there are more of such imports in the Indian form, could this also contribute to that impression of greater “respectability” for the Sri Lankan form?
David Marjanović says

May 10, 2009 at 6:06 pm

There are no extant flightless birds in India, but there used to be one which was a common ancestor of the elephant bird and the ostrich, and thereby related to other ratites. The spread of ratites across the different continents happened before Gondwana broke up, and thereafter their evolution diverged. In fact, the modern ostrich is held to have migrated to Africa via Arabia from India

Not quite so fast! All of this depends on the idea that 1) the ratites are each other’s closest relatives and 2) lost flight only once.
Last year two papers came out that show that the tinamous sit highly nested inside the ratite tree. And they can fly.
Add to this that ostriches are known from the last 50 (since the Eocene), perhaps 60 (since the Paleocene), million years of Europe; you’re right to say they didn’t originate in Africa.
And then consider the lithornithids, flying birds of the Paleocene and Eocene of the northern hemisphere that are probably closely related to the “ratites” and tinamous. We don’t know more about their place in the phylogenetic tree, because nobody has ever sat down and tried to find out.
Further take into account the complete lack of Cretaceous ratites so far. At the end of the Cretaceous, Madagascar was an island; a diverse fauna is known, but nothing remotely similar to an elephantbird has been found to date.
References on paleognath phylogeny:
Shannon J. Hackett, Rebecca T. Kimball, Sushma Reddy, Rauri C. K. Bowie, Edward L. Braun, Michael J. Braun, Jena L. Chojnowski, W. Andrew Cox, Kin-Lan Han, John Harshman, Christopher J. Huddleston, Ben D. Marks, Kathleen J. Miglia, William S. Moore, Frederick H. Sheldon, David W. Steadman, Christopher C. Witt, Tamaki Yuri: A Phylogenomic Study of Birds Reveals Their Evolutionary History. Science 27 June 2008:Vol. 320. no. 5884, pp. 1763 – 1768 DOI: 10.1126/science.1157704
John Harshman, Edward L. Braun, Michael J. Braun, Christopher J. Huddleston, Rauri C. K. Bowie, Jena L. Chojnowski, Shannon J. Hackett, Kin-Lan Hand, Rebecca T. Kimball, Ben D. Marks, Kathleen J. Miglia, William S. Moorek, Sushma Reddy, Frederick H. Sheldon, David W. Steadman, Scott J. Steppan, Christopher C. Witt, and Tamaki Yuri: Phylogenomic evidence for multiple losses of flight in ratite birds. Proceedings of the National Academy of Sciences Vol. 105, no. 36. September 9, 2008. pp. 13462 – 13467.
David Marjanović says

May 10, 2009 at 6:14 pm

Forgot to mention:
– In the two new studies (which are bigger than all previous ones), the ostriches come out as the sister-group of all other paleognaths ( = the other “ratites” + tinamous). So maybe there was one dispersal event across the Tethys or the Caribbean that distributed the paleognaths between Laurasia and the fragments of Outer Gondwana ( = Gondwana without Africa, which was already too far gone).
– New Zealand broke off from Antarctica-Australia 88 million years ago, deep in the Cretaceous. Yet it carries two clades of “ratites” which probably aren’t sister-groups. It’s difficult to imagine that they were already flightless when they arrived in NZ. Various dinosaurs are known from near the end of the Cretaceous in NZ, by the way.
Stuart says

May 10, 2009 at 6:22 pm

Various dinosaurs are known from near the end of the Cretaceous in NZ, by the way.
More specifically, most have been found in my part of the country, many of them discovered by an amateur palaeontologist of some renown in our little part of the world.
Lugubert says

May 12, 2009 at 4:06 am

Prof. Parpola deserves much credit for his collecting seal etc. reproductions. I think, however, that the foundations of his theories aren’t sufficient to support what he has built on them.
I find the two (or more) languages possibility interesting and not too far from my not entirely unserious theory that the seals are just intended to produce name and address tags, and that the longest inscription was displayed as a house name.
David Marjanović says

May 12, 2009 at 5:13 pm

most have been found in my part of the country

Yep. Along with marine fauna including lots of mosasaurs.
marie-lucie says

May 12, 2009 at 6:02 pm

Stuart: the amateur paleontologist Joan Wiffen sounds like she could be the reincarnation of Mary Anning, the self-taught English fossil hunter.
John Cowan says

May 30, 2020 at 11:01 am

New Zealand broke off from Antarctica-Australia 88 million years ago

As memorably retold in “Moving Apart”, a short work of continental romance fiction. Money line: “New Zealand also said that Australia had given him possums, which was incredibly rude and completely beside the point even if it was true.”
David Marjanović says

May 30, 2020 at 11:23 am

It wasn’t true, needless to say. New Zealand did have a mysterious non-bat mammal as late as the Miocene, but it wasn’t a marsupial. The possums (Trichosurus vulgaris) that are chewing up the countryside today were imported from Australia in the 19th century because some dolt thought it sounded like a cool idea.
Rodger C says

May 31, 2020 at 10:25 am

because some dolt thought it sounded like a cool idea.

That’s how America got its plague of starlings: from a dolt who wanted America to contain every bird mentioned by Shakespeare, and introduced some to Central Park, where perhaps he expected them to stay.
David Marjanović says

May 31, 2020 at 11:06 am

Sparrows, too.
Tim May says

May 31, 2020 at 7:03 pm

Whenever someone mentions Eugene Schieffelin and the American Acclimatization Society, I wonder if he intended to get around to the estridges (ostriches) mentioned in Henry IV Part I.
January First-of-May says

June 1, 2020 at 8:49 am

Considering that Henry IV Part I happens to also contain the only place in Shakespeare’s works where starlings are mentioned, surely he must have.

But perhaps he might not have realized that “estridge” is a kind of bird, and/or which particular kind of bird it was (there had apparently been some debate on the subject in the 19th century).

INDUS SCRIPT SQUABBLE.

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments