India’s Hidden Languages.

Agnee Ghosh writes for BBC Future about an effort to track down vanishing languages of India:

It was 2010 and Ganesh N Devy was concerned about the lack of comprehensive data on the languages of India. “The 1961 [Indian] census recognised 1,652 mother tongues,” says Devy, “but the 1971 census listed only 109. The discrepancy in numbers frustrated me a lot.” So, Devy decided to find out what was going on himself. […]

As a professor of English at Maharaja Sayajirao University of Baroda in Gujarat, Devy has always had an interest in languages. He has founded a number of organisations for their study, documentation and preservation, including the Bhasha Research and Publication Centre in Baroda, the Adivasis Academy in Tejgadh, the DNT-Rights Action Group, among others. As part of his work at the organisations, he used to go to villages where tribal populations lived and research them. He started noticing that these tribes have their own languages, which often do not get reported in the official government census. […]

Devy felt that it would take a long, arduous process to document every language in India, so he stepped in to help. He launched the People’s Linguistic Survey of India (PLSI) in 2010, for which he put together a team of 3,000 volunteers from all over the country. Most of these volunteers weren’t researchers, but writers, school teachers, and other non-professional-linguists who possessed an intimacy with their mother tongue that was invaluable to Devy.

In a survey conducted during 2010-2013, Devy and his team recorded 780 languages and 68 scripts across the country. Devy says that nearly 100 languages could not be documented, either because of remoteness of the region or conflict, so the true number of languages in India continues to be hidden from us. Since 2013, the PLSI has published 68 volumes, featuring detailed profiles of each language that Devy came across. The remaining 27 volumes will be published by 2025.

Take the state of Odisha, which has the largest number of tribal communities in India. Devy always knew it would be a linguistic goldmine for him, but he could not find a linguist there who would be able to work on these remote languages. Around this time, he came across a taxi driver who used to work for the district magistrate in Odisha. Whenever the district magistrate used to go for a visit in the villages, the driver preferred talking to the villagers rather than sitting in his car. “Over the years, he had mastered four languages and he had constructed grammar for those four languages and had collected folk songs and stories,” says Devy. “It was material that was worthy of giving him a doctorate, maybe two doctorates.”

It ends with an account of the previously unknown Rai-Rokdung language of Sikkim:

In 2017, the researchers at Sidhela discovered the Rai-Rokdung community completely by chance, when a student at their university informed them about it. Rokdung is one of the pacha (divisions) of the Bantawa clan within the Rai community. The Rokdung clan is mostly located in East Sikkim and the members of the community claim to possess a distinct language of their own separate from the Bantawa. There has been no earlier mention of the language in the linguistic history of the region. Since then, research of the community, which has a population of just 200 people, has been ongoing.

Hima Ktien, a linguist at Sidhela who spearheaded the documentation project of the language of the Rai Rokdung community, spoke about how most Rokdung community members self-identify as members of the Rai groups, while some say they are Nepali. Only a few members choose to identify themselves as being a part of the Rokdung Yupacha or as Yaku. “We found only 20 people who could speak the Rai-Rokdung language,” says Ktien. If a language isn’t taught in school, then there is no means to get to use the language. So people attempt to assimilate into society by adopting the language of the majority, and because of that the Rai-Rokdung language suffered because people started shifting to other languages. […]

But Ktien tells me that they witnessed something remarkable towards the end of their research. In a hopeful turn of events, during their field visit in January 2020, they noticed that the speakers of the Rokdung language were coming together weekly in an effort to revitalise Rokdung. The older generation, who were mostly grandparents, were trying to pass on the language to the generation below, who were mostly parents. These willing pupils would note down the words they were unfamiliar with and actively try to use them. “It was very gratifying to witness the change that was brought on by our work in documenting the language,” says Ktien.

(We discussed the New Linguistic Survey of India in 2007 and 2013.) Thanks, Trevor!


  1. J.W. Brewer says

    It certainly wouldn’t surprise me if this has in fact happened and just didn’t get mentioned in the article, but I would think that part of a sensible reaction to the 1652 v. 109 languages only a decade apart stats would be to hypothesize a switch from a splitter approach to a lumper approach by the census authorities and see if you could trace some of the “missing” languages into some still-reported language they had been lumped into.

    Note also that the Bantawa “language” is already said to have a whole bunch of dialects, some of which are more divergent than others. Do we suppose there’s a basis for thinking Rai-Rokdung so different that it cannot be legitimately classified as just another Bantawa dialect, or is this a situation where people’s sense of ethnic/political identity and difference is driving what does and doesn’t get counted as a distinct “language”?

  2. Good questions all!

  3. The first sentence of that Wikipedia page already shows evidence of past editorial disagreement about whether Bantawa should count as a single language or multiple languages:

    The Bantawa Language (also referred to as An Yüng, Bantaba, Bantawa Dum, Bantawa Yong, Bantawa Yüng, Bontawa, Kirawa Yüng), is a Kiranti languages spoken in the eastern Himalayan hills of eastern Nepal by Bantawa Rai ethnic groups.

  4. Hat, J.W. Brewer: According to Colin Masica (THE INDO-ARYAN LANGUAGES, p. 421) Indian authorities were alarmed at the sharp increase in the number of individuals who, within the “Hindi belt”, had claimed a regional language as an L1 in the 1961 census and decreed that, from the 1971 census onwards, all such regional languages would be counted as Hindi pure and simple. I suppose that, when it comes to census results relating to the rest of India (i.e. outside the Hindi belt), similar such “lumper” approaches must have been encouraged if not required (perhaps unofficially).

    I never was very impressed with any media outlet’s treatment of linguistic issues (I think the same is true of most hatters), but this direct quote in the article REALLY made me stop and think:

    “I wanted to write down these words using the English alphabet but it was difficult because of syntactical differences.”

    Is this some specialized/old-fashioned/specifically Indian (South Asian?) meaning of “syntactical” which I am unaware of, or is this as nonsensical as it looks?

  5. I wonder if there is a bias toward splitting among Indo-Aryan languages, and lumping among others, either among linguists or government language counters.

  6. J.W. Brewer says

    Around page 8 of this interesting publication by the Indian census authorities you get details with how they did their partial-lumping for the 2011 census. The idea is that everyone is asked an open-ended free-response question re “mother tongue” and then the table shows a tabulation of e.g. 50+ “mother tongues” chosen by >10,000 people that are listed individually but then also lumped into the larger “Hindi” total.

    There are a bunch of “Other” line items lumping together reported mother tongues (by lumped affiliation with larger language) that don’t get recorded by name in this publication because they did not meet the 10,000-speaker threshold. I don’t know how much more detailed data about them is published elsewhere. But big picture:

    “At the 2011 census, the number of such raw returns of mother tongues has totaled 19569. Since mother- tongues as returned in the census are basically the designations provided by the respondents of the linguistic mediums in which the respondents think they communicate, they need not be identical with the actual linguistic mediums. For assessing the correlation between the mother tongue and designations of the census and for presenting the numerous raw returns in terms of their linguistic affiliation to actual languages and dialects, 19569 raw returns were subjected to thorough linguistic scrutiny, edit and rationalization. This resulted in 1369 rationalized mother tongues and 1474 names which were treated as ‘unclassified’ and relegated to ‘other’ mother tongue category. The 1369 rationalized mother tongues were further classified following the usual linguistic methods for rational grouping based on available linguistic information. Thus, an inventory of classified
    mother tongues returned by 10,000 or more speakers are grouped under appropriate languages at the all India level, wherever possible, has been prepared for final presentation of the 2011 mother tongue data. The total number of languages arrived at is 121.”

  7. David Eddyshaw says


    To be fair, Mossang is described as a “farmer”, so can perhaps be allowed to be unclear about exactly what “syntax” means (though somebody might perhaps have told him.)

    However, the customary nonsense does appear in the journalist’s own text, e.g.

    She was the last fluent speaker of Bo – one of the oldest languages in the world, dating back to pre-Neolithic times

    Just like Welsh!

  8. Woops! Well, you hereby get belated credit for the link.

  9. Did you know that the feeling of love can be described in 9,000 words if we put all the Indian languages together? A country that is often touted as the treasure trove of linguistic diversity, has 780 languages of which 600 of them are on the verge of becoming non-existent.
    In the last 60 odd years itself, close to 250 languages have become extinct in India, which means speakers of these languages have migrated onto other languages. This also means that they have left behind a significant part of their culture and identity.
    So how does one go about preserving a language?
    There are several routes like commercialising the language and encouraging it in school curriculum.
    But what happens when the script of a language is missing? The ability of putting our thoughts into words and spreading information is worthless if there is no script. With no documentation of words that we speak, how does one pass it on to the next generation?
    Even if a parent tries to teach their child the language, it will differ slightly from what their neighbours might teach their kids. This difference between spoken words eventually increases to a point where it becomes two different dialects.
    This was the case with the Tangsa tribe of Arunachal Pradesh. The community is further divided into 40 sub-tribes, each having its own dialect. Over the years, the meaning of sentences and words grew so apart that one sub-tribe could not understand the other.



Speak Your Mind