India’s Hidden Languages.

Agnee Ghosh writes for BBC Future about an effort to track down vanishing languages of India:

It was 2010 and Ganesh N Devy was concerned about the lack of comprehensive data on the languages of India. “The 1961 [Indian] census recognised 1,652 mother tongues,” says Devy, “but the 1971 census listed only 109. The discrepancy in numbers frustrated me a lot.” So, Devy decided to find out what was going on himself. […]

As a professor of English at Maharaja Sayajirao University of Baroda in Gujarat, Devy has always had an interest in languages. He has founded a number of organisations for their study, documentation and preservation, including the Bhasha Research and Publication Centre in Baroda, the Adivasis Academy in Tejgadh, the DNT-Rights Action Group, among others. As part of his work at the organisations, he used to go to villages where tribal populations lived and research them. He started noticing that these tribes have their own languages, which often do not get reported in the official government census. […]

Devy felt that it would take a long, arduous process to document every language in India, so he stepped in to help. He launched the People’s Linguistic Survey of India (PLSI) in 2010, for which he put together a team of 3,000 volunteers from all over the country. Most of these volunteers weren’t researchers, but writers, school teachers, and other non-professional-linguists who possessed an intimacy with their mother tongue that was invaluable to Devy.

In a survey conducted during 2010-2013, Devy and his team recorded 780 languages and 68 scripts across the country. Devy says that nearly 100 languages could not be documented, either because of remoteness of the region or conflict, so the true number of languages in India continues to be hidden from us. Since 2013, the PLSI has published 68 volumes, featuring detailed profiles of each language that Devy came across. The remaining 27 volumes will be published by 2025.

Take the state of Odisha, which has the largest number of tribal communities in India. Devy always knew it would be a linguistic goldmine for him, but he could not find a linguist there who would be able to work on these remote languages. Around this time, he came across a taxi driver who used to work for the district magistrate in Odisha. Whenever the district magistrate used to go for a visit in the villages, the driver preferred talking to the villagers rather than sitting in his car. “Over the years, he had mastered four languages and he had constructed grammar for those four languages and had collected folk songs and stories,” says Devy. “It was material that was worthy of giving him a doctorate, maybe two doctorates.”

It ends with an account of the previously unknown Rai-Rokdung language of Sikkim:

In 2017, the researchers at Sidhela discovered the Rai-Rokdung community completely by chance, when a student at their university informed them about it. Rokdung is one of the pacha (divisions) of the Bantawa clan within the Rai community. The Rokdung clan is mostly located in East Sikkim and the members of the community claim to possess a distinct language of their own separate from the Bantawa. There has been no earlier mention of the language in the linguistic history of the region. Since then, research of the community, which has a population of just 200 people, has been ongoing.

Hima Ktien, a linguist at Sidhela who spearheaded the documentation project of the language of the Rai Rokdung community, spoke about how most Rokdung community members self-identify as members of the Rai groups, while some say they are Nepali. Only a few members choose to identify themselves as being a part of the Rokdung Yupacha or as Yaku. “We found only 20 people who could speak the Rai-Rokdung language,” says Ktien. If a language isn’t taught in school, then there is no means to get to use the language. So people attempt to assimilate into society by adopting the language of the majority, and because of that the Rai-Rokdung language suffered because people started shifting to other languages. […]

But Ktien tells me that they witnessed something remarkable towards the end of their research. In a hopeful turn of events, during their field visit in January 2020, they noticed that the speakers of the Rokdung language were coming together weekly in an effort to revitalise Rokdung. The older generation, who were mostly grandparents, were trying to pass on the language to the generation below, who were mostly parents. These willing pupils would note down the words they were unfamiliar with and actively try to use them. “It was very gratifying to witness the change that was brought on by our work in documenting the language,” says Ktien.

(We discussed the New Linguistic Survey of India in 2007 and 2013.) Thanks, Trevor!


  1. J.W. Brewer says

    It certainly wouldn’t surprise me if this has in fact happened and just didn’t get mentioned in the article, but I would think that part of a sensible reaction to the 1652 v. 109 languages only a decade apart stats would be to hypothesize a switch from a splitter approach to a lumper approach by the census authorities and see if you could trace some of the “missing” languages into some still-reported language they had been lumped into.

    Note also that the Bantawa “language” is already said to have a whole bunch of dialects, some of which are more divergent than others. Do we suppose there’s a basis for thinking Rai-Rokdung so different that it cannot be legitimately classified as just another Bantawa dialect, or is this a situation where people’s sense of ethnic/political identity and difference is driving what does and doesn’t get counted as a distinct “language”?

  2. Good questions all!

  3. The first sentence of that Wikipedia page already shows evidence of past editorial disagreement about whether Bantawa should count as a single language or multiple languages:

    The Bantawa Language (also referred to as An Yüng, Bantaba, Bantawa Dum, Bantawa Yong, Bantawa Yüng, Bontawa, Kirawa Yüng), is a Kiranti languages spoken in the eastern Himalayan hills of eastern Nepal by Bantawa Rai ethnic groups.

  4. Hat, J.W. Brewer: According to Colin Masica (THE INDO-ARYAN LANGUAGES, p. 421) Indian authorities were alarmed at the sharp increase in the number of individuals who, within the “Hindi belt”, had claimed a regional language as an L1 in the 1961 census and decreed that, from the 1971 census onwards, all such regional languages would be counted as Hindi pure and simple. I suppose that, when it comes to census results relating to the rest of India (i.e. outside the Hindi belt), similar such “lumper” approaches must have been encouraged if not required (perhaps unofficially).

    I never was very impressed with any media outlet’s treatment of linguistic issues (I think the same is true of most hatters), but this direct quote in the article REALLY made me stop and think:

    “I wanted to write down these words using the English alphabet but it was difficult because of syntactical differences.”

    Is this some specialized/old-fashioned/specifically Indian (South Asian?) meaning of “syntactical” which I am unaware of, or is this as nonsensical as it looks?

  5. I wonder if there is a bias toward splitting among Indo-Aryan languages, and lumping among others, either among linguists or government language counters.

  6. J.W. Brewer says

    Around page 8 of this interesting publication by the Indian census authorities you get details with how they did their partial-lumping for the 2011 census. The idea is that everyone is asked an open-ended free-response question re “mother tongue” and then the table shows a tabulation of e.g. 50+ “mother tongues” chosen by >10,000 people that are listed individually but then also lumped into the larger “Hindi” total.

    There are a bunch of “Other” line items lumping together reported mother tongues (by lumped affiliation with larger language) that don’t get recorded by name in this publication because they did not meet the 10,000-speaker threshold. I don’t know how much more detailed data about them is published elsewhere. But big picture:

    “At the 2011 census, the number of such raw returns of mother tongues has totaled 19569. Since mother- tongues as returned in the census are basically the designations provided by the respondents of the linguistic mediums in which the respondents think they communicate, they need not be identical with the actual linguistic mediums. For assessing the correlation between the mother tongue and designations of the census and for presenting the numerous raw returns in terms of their linguistic affiliation to actual languages and dialects, 19569 raw returns were subjected to thorough linguistic scrutiny, edit and rationalization. This resulted in 1369 rationalized mother tongues and 1474 names which were treated as ‘unclassified’ and relegated to ‘other’ mother tongue category. The 1369 rationalized mother tongues were further classified following the usual linguistic methods for rational grouping based on available linguistic information. Thus, an inventory of classified
    mother tongues returned by 10,000 or more speakers are grouped under appropriate languages at the all India level, wherever possible, has been prepared for final presentation of the 2011 mother tongue data. The total number of languages arrived at is 121.”

  7. David Eddyshaw says


    To be fair, Mossang is described as a “farmer”, so can perhaps be allowed to be unclear about exactly what “syntax” means (though somebody might perhaps have told him.)

    However, the customary nonsense does appear in the journalist’s own text, e.g.

    She was the last fluent speaker of Bo – one of the oldest languages in the world, dating back to pre-Neolithic times

    Just like Welsh!

  8. Woops! Well, you hereby get belated credit for the link.

  9. Did you know that the feeling of love can be described in 9,000 words if we put all the Indian languages together? A country that is often touted as the treasure trove of linguistic diversity, has 780 languages of which 600 of them are on the verge of becoming non-existent.
    In the last 60 odd years itself, close to 250 languages have become extinct in India, which means speakers of these languages have migrated onto other languages. This also means that they have left behind a significant part of their culture and identity.
    So how does one go about preserving a language?
    There are several routes like commercialising the language and encouraging it in school curriculum.
    But what happens when the script of a language is missing? The ability of putting our thoughts into words and spreading information is worthless if there is no script. With no documentation of words that we speak, how does one pass it on to the next generation?
    Even if a parent tries to teach their child the language, it will differ slightly from what their neighbours might teach their kids. This difference between spoken words eventually increases to a point where it becomes two different dialects.
    This was the case with the Tangsa tribe of Arunachal Pradesh. The community is further divided into 40 sub-tribes, each having its own dialect. Over the years, the meaning of sentences and words grew so apart that one sub-tribe could not understand the other.



  10. The NYT has a good piece by Sameer Yasir about Devy and his work; here’s a sample:

    Amit Shah, India’s powerful home minister, has often promoted the idea of using Hindi to replace English as the de facto national language of communication.

    “If there is one language that has the ability to string the nation together in unity, it is the Hindi language,” Mr. Shah said in 2019. India’s Constitution designates both Hindi and English as official languages for government business, but it’s not compulsory to teach Hindi in public schools in some states, and many millions of Indians do not speak the language. The government wants to change that.

    “Time has come to make the official language an important part of the unity of the country,” Mr. Shah said in April, staking out a stance that generates resentment among Indians who do not speak Hindi.

    Mr. Devy suggested the government’s efforts could backfire. “Whenever there is a war on your mother tongue, there is division — and identity becomes strong,” he said. Many Hindus, Mr. Devy noted, do not speak Hindi as their first language. “People in the south do not look at Hinduism as being Hindi-based,” he said. “Far from it, they think the tolerant version of Hinduism that they developed through the centuries is the more authentic Hinduism.”

    Thanks, mapache!

  11. Is “string the nation together” usual Indian English? It sounds vaguely sinister to me, but maybe that’s the context.

  12. By the way, why Orissa and Oriya were recently renamed to Odisha and Odia?

  13. John Cowan says

    Only the transcription into Latin letters is affected, and I think the idea is that anglophone /d/ is close to Oriyan [r] than anglophone /r/ is (which is incontrovertible).

  14. The 13th consonant and the 3rd letter of the’Ṭa’ series (cerebral), corresponding to the ‘d’ sound. When it occurs at the beginning of Oria words it is pronounced as in ḍay and when at the end or middle of a word it is pronounce as r͟ḍ in bird.

    Says the illustrious author of Purnachandra Odia Bhashakosha (or Purnachandra Oriya Bhasha Kosh or Pūrṇṇachandra Or͟ḍīā Bhāshākosha in his transliteration). Here

  15. It has been said earlier that there are a few redundant letters in the orthography. Although these letters do not have distinct value in pronunciation, they are useful in retaining the spelling of the borrowed or the derived words from Sanskrit. These are:

    ଶ -palatal (tâlabya) ú, pronounced s.
    The other modifying symbol which is used in Oriya writing is a dot below ଡ–ḍa and ଢ-ḍha. The dot below these two letters signifies that the stops should be read as flaps, i.e., ṛ and ṛh respectively. These allographs of ḍ and ḍh occur intervocalically only. Sometime ago, a few other letters like ଚ-ca used to be written also with a dot below, but this practice is no longer in vogue except perhaps in old handwriting.

    Bijay Prasad Mahapatra, A Syncronic Grammar of Oriya, 2007

    ú must be a typo, it is ś.

  16. John, thanks. I thought it is an important political decision:

    BHUBANESWAR: The name of Orissa was officially changed to Odisha and its language from Oriya to Odia following presidential assent today to the bill passed by Parliament and issuance of a notification.
    In the first official letter using Odisha in the letterhead, Chief Minister Naveen Patnaik thanked President Pratibha Patil for her "historic decision".
    "I convey to you the deep sense of gratitude of our people for the state to be known now as Odisha," Patnaik wrote.
    There were celebrations with fire crackers burst tonight following the notification.
    At the secretariat the chief minister, ministers and officials watched a sparkling display of fireworks.
    The chief minister declared a holiday tomorrow. All government offices, schools and colleges will remain closed.
    Sweets were distributed among those present at the celebrations, organised at short notice following instruction from Patnaik.
    Earlier an unanimous resolution was passed in the state assembly for change in the name of the state from Orissa to Odisha in 2008 which received the nod of both the houses of the Parliament in 2010.
    The President’s assent and notification was issued today.
    The state which was formed on linguistic basis on April 1, 1936, was known as Orissa and its language was Oriya since then.


    So now I have three hypotheses:
    – etymological
    – peculiarities of English /r/
    – variation in Odia ɽ

  17. David Eddyshaw says

    The new transliteration seems in fact to be a straightforward error, justified neither by the actual phonology of the Oriya language, nor by any existing transliteration system.

    Who would have though that politicians (of all people) could be completely wrong about a linguistic matter?
    (I suspect that somehow this has morphed into a proxy for some other issue; maybe objection to the creeping imposition of Hindi as the national language?)

    Incidentally, why is our glorious Brexity government still acquiescing in the insulting French habit of calling the capital of England Londres? Surely Global Britain should be insisting that benighted foreigners everywhere call the city by its correct name? A poor show, I call it!

    [The correct name is, of course, actually Llundain, but one does not really expect the English to be able to cope with that.]

  18. @DE, possibly ɽ is a positional allophone of the retroflex ḍ (compare “gotta”).
    It is written with the same symbol as ḍ with a dot below (nuqta), ଡ and ଡ଼….

    As for ś, if the grammar (above) is accurate, it there only graphically (and supposedly marks an etymological ś).

    Which means “Odisha” is a transliteration and not transcription.

  19. David Marjanović says

    I can hear the [ɖ], probably preceded by [ɻ] as described above.

    Still, the ś is a completely unambiguous laminal [s], so “Odisha” is indeed a transcription of the written form and a not a transcription of how it’s pronounced.

    (“Transliteration” means 1 : 1 equivalence, in this case “Oḍiśa” for example.)

  20. I suspect that somehow this has morphed into a proxy for some other issue

    Well, duh; it can be taken as an axiom that any and all renaming issues have nothing to do with actual linguistic facts. And yet people insist on taking the excuses given by politicians at face value and solemnly insisting that (to take an example that is always in my mind) that it is our duty to say and write “Myanmar” rather than “Burma” because a bunch of thugs who happened to wind up running the country and oppressing their fellow Burmese for a while decided they liked it better for their own reasons.

  21. @LH, I suspect the same because of “fireworks” but I can find a motivation.

    For example, Oriya has /r/ and /ḍ/ and /d/.
    /ḍ/ has allophones ɖ and ɽ.
    Possibly for speakers it is not convenient when one of them merges with /d/ and the other with /r/.

  22. Is this (“I now declare the Orissa world cup open”) r retroflex? And earlier

  23. The guy is the head of the state.

    It seems, if an English speaker can do flaps, then she can use a flap here with native speakers.
    As for -sh-, it is the same letter as in Sri/Shree. In transliteration ś. In Oriya it sounds the same as s.

  24. The article 351 of the Indian constitution is interesting:

    “351. Directive for development of the Hindi language.—It shall be the
    duty of the Union to promote the spread of the Hindi language, to develop it so
    that it may serve as a medium of expression for all the elements of the
    composite culture of India and to secure its enrichment by assimilating without
    interfering with its genius, the forms, style and expressions used in Hindustani
    and in the other languages of India specified in the Eighth Schedule, and by
    drawing, wherever necessary or desirable, for its vocabulary, primarily on
    Sanskrit and secondarily on other languages.”

  25. “There are a few things I would like to mention. I am reminded of an incident which took place in Switzerland about 40 years back. A Swiss-German asked me which part of India do you belong to ? I replied ‘Orissa’. He searched for the name in his map written in German and said that he has found ‘Odisha’ but not ‘Orissa’. Then I explained to him that thanks to the British rule we are forced to use this name. He was surprised and I was upset”

  26. The wonderful thing is that the proponents said that they changed it according to the local pronunciation.
    They say [oɽisa] and they write Odisha. They write Odisha and they say [oɽisa].

    An input a local linguist would have helped:(

  27. i didn’t realize that the engineering of a deeper hindi/urdu split was explicitly enshrined in the indian constitution! fascinating to see how it’s laid out: the distinction from “hindustani” (presumably meaning a vernacular register that has very few obvious sanskritisms or persianisms); the incorporation of elements from other vernaculars “without interfering with its genius*”; the prioritization of sanskrit-oriented language engineering.

    equally interesting: the constitution of pakistan does not do anything of the kind:

    251 – National language.
    (1) The National language of Pakistan is Urdu, and arrangements shall be made for its being used for official and other purposes within fifteen years from the commencing day.
    (2) Subject to clause (1), the English language may be used for official purposes until arrangements are made for its replacement by Urdu.
    (3) Without prejudice to the status of the National Language, a Provincial Assembly may by law prescribe measure for the teaching, promotion and use of a provincial language in addition to the national language.


    * this genius, we seem meant to understand, ends geographically at the partition line, and socially at the border of brown-shorts-approved hinduism.

Speak Your Mind