Introducing CHIRILA.

Anyone interested in Australian linguistics will be gratified by this Anggarrgoon post:

I am very pleased to announce that the first phase of CHIRILA (Contemporary and Historical Resources for the Indigenous Languages of Australia) has been released. This represents approximately 180,000 words from 155 different Australian languages. It is a subset of the full database (of approx 780,000 items); eventually I hope to be able to release most of the data. Currently, the first phase is that for which we have explicit permission, or which is already in the public domain.

The material is hosted at pamanyungan.net/chirila; please see the web site for more information about the contents of the database, how to download data, what formats are available, and the like. We do not provide a web interface to the data; you download it and use excel or a database program to read the files. We hope the data will be useful to researchers, community members, and others with an interest in Australia’s Indigenous language heritage. pamanyungan.net/chirila also includes access to the preprint of a paper describing the database (both the online and full versions).

I’m not sure why they don’t provide a web interface, but I imagine there are good and sufficient reasons.

Comments

  1. Time and money, I’m sure.

  2. By now, surely everybody has a twelve-year-old nephew who could set up a website for a few soft drinks and a kind word.

  3. Nah, they are too busy getting views on Periscope so they can impress the girls. Try a ten-year-old, or better bribes.

    EDIT: Also, you want a web interface? Import the files into Google Sheets. It’s on the web, and it’s got an interface. And you get cooperative comments for free.

  4. You have to agree to various academic-ish things about attribution and non-commercial use before you can get the data, which suggests that public queries by randoms is not what they have in mind.

  5. David Marjanović says:

    a twelve-year-old nephew

    Or, failing that, a grad student.

  6. Wow, plenty of gender-related assumptions about coding in the comments here. Classy.
    We don’t provide a web interface because it’s my strong impression from previous queries that the vast majority of the people who want data either a) want to download a particular language, or b) want to manipulate data in their favorite program (R, python, etc). If I start hearing differently, we’ll do at least a simple interface where you can look for particular Aboriginal or English words.

  7. Wow, plenty of gender-related assumptions about coding in the comments here. Classy.

    Seconded.

  8. Is it a stereotype if it’s true?

    Despite the best efforts of the school system and lots of parents here in the land of genus discourse to counteract the skewed roles in popular culture, the sad truth still is that you’re going to have much better luck asking a young boy for help with computers than a young girl. Counterexamples exist, but the numbers say to bet on the nephew.

    The age group where a sizable proportion had their own iPad mini before they could talk is still only about six, here. Will be interesting to see how things fall out in five years.

  9. This sophistry doesn’t change the matter that these comments were insulting and distasteful to a the very linguist whose work merited a post here, one who has worked on a number of lexical databases, and who I imagine could easily build a fine web interface if she so chose.

    I am of course a big fan of this blog, but it has long bothered me that there are so few women commenters here, even as women are becoming a large proportion of linguists today. I don’t know what can be done about it, but these kind of comments sure don’t help.

  10. I can understand if the way gender roles play out among twelve year olds is distasteful to you.

    But nobody is saying that women can’t create web pages just as well as men, just that they do it more rarely. At all ages. I don’t understand what you think will be achieved by suppressing reference to that fact.

    That aside, I suspect that there are more women commenters here than meets the eye — there are many places on the Internet where identifying as female distracts attention from your contributions, and while the Hattery may think itself a better place, many may think that it’s better to be safe than sorry.

  11. This sophistry doesn’t change the matter that these comments were insulting and distasteful to a the very linguist whose work merited a post here, one who has worked on a number of lexical databases, and who I imagine could easily build a fine web interface if she so chose.

    Jesus Christ. I insulted Claire because I happened to mention a nephew rather than a niece? Please.

  12. Maybe that plus Lars’s “impress the girls”?

  13. ə de vivre says:

    I insulted Claire because I happened to mention a nephew rather than a niece?

    I can’t speak for Claire, but no one’s said anything about being insulted (and as a woman on the internet, I suspect she has thicker skin than most of us men). You and Lars made comments that assumed that a generic coder was a boy who’s interested in impressing girls. I don’t think anyone’s accusing anyone else of acting in bad faith, just pointing out that maybe it’s not a great habit if you don’t think that having things like coding be heavily gendered is a good thing.

    And when a bunch of men start explaining to a woman how what they said absolutely wasn’t sexist, that’s maybe a sign that there’s something to learn here.

  14. Eli Nelson says:

    @Lars: I also find the explanation of how “it’s not a stereotype because it’s true” to be unnecessary. Nobody is suppressing any facts; Claire’s comments were in response to the assumption that a hypothetical young coder would be male. Nobody needs to do any betting on unknown nieces and nephews either; presumably people who have actual nieces or nephews can ask them in person if they would like to help set up a website, without having to guess based on only their gender.

    @ə de vivre: Y brought the word “insulting” into the conversation.

  15. ə de vivre says:

    Fair enough. Funny how when a woman calls out gendered language, all the men want to do is talk about feelings 🙂

  16. Eli Nelson says:

    I actually don’t find this situation funny. Unfortunate is the word I’d use. I hope people can get over their emotional reactions and possibly learn something from this thread.

  17. ə de vivre says:

    “Funny” as in ironic, not ha-ha funny.

  18. Eli Nelson says:

    An uncharitable reader could easily see your previous comment as condescending, with the word “funny” and the smiley-face emoticon. It seems to me that it might tend to inflame people’s feelings rather than cool them down. I don’t think that sort of thing is helpful in this kind of situation, which is why I objected to your wording.

  19. ə de vivre says:

    Condescending to who? I’m not asking rhetorically. My goal was to remind people that I’m not trying to shame anybody, but since the whole point I was trying to make is that intent doesn’t control your words’ effects I’m open to being wrong here. I’m not exactly a long-time commenter, but as a general rule I’ve found most of the other commenter are willing to assume that everyone’s acting in good faith.

    I guess I find some humour in the ironic reversal of gender stereotypes and observed behaviour. I don’t think that necessarily diminishes the points people have made about gendered language and computer science, but if you think otherwise we can have that conversation too.

  20. Funny how when a woman calls out gendered language, all the men want to do is talk about feelings

    I don’t think anybody but you was aware you were a woman. As for the comments at issue, I think mine was harmless unless you think all comments about theoretical coders should in principle be about women to encourage the cultural shift (a position with which I would be in sympathy, and if I’d been making a thoughtful comment about coding instead of a dumb joke, I probably would have written differently), and even so I think a mild suggestion of better phrasing would be all that was required or even defensible. Which brings us to this:

    it has long bothered me that there are so few women commenters here

    I am greatly bothered by the implication that there is something I am doing or not doing that could be considered the cause of this purported disparity (as Lars points out, there is no way to know who is or is not female) and therefore reprehensible. I have been defending women’s rights and attacking the patriarchy since around 1970 (for evidence, see my contributions to MetaFilter since 2002), and on this blog I have taken pains to highlight women’s contributions to scholarship and literature when the opportunity has arisen (given that this is, after all, a blog about my thoughts about my reading rather than a sociological or political site). Just today I wrote to Google Books complaining about pages missing from their scan of an 1853 issue of Moskvityanin, saying “I wouldn’t care if it was only a few bad poems, but it happened to a chunk of a short novel by an important woman writer who (like most women writers of the day) never had her works collected in nice bound sets like the big boys, so this is the only way she can be read.” I say all this not to pat myself on the back — I am aware that good intentions are no guarantee of results, and I am always willing and eager to improve — but to point out that I am not some oblivious male who needs the obvious spelled out. If you have evidence that I have been posting and commenting in sexist ways that are driving away women, by all means let me know. Otherwise, I feel like I am being treated like a Usual Suspect for no good reason. I might note that this very post represents an attempt to support women’s scholarship (and I’m a bit surprised that Claire, with whom I have been in friendly correspondence for over a decade, was so ungracious about it).

  21. Society is sexist. That’s why nephews are vastly more likely to be coders than nieces. You don’t magically make society less sexist by suggesting that people are likely to have 12-yo nieces who can set up websites, because that’s false. It’s possible, but not likely.

    What will help is to ask the nieces first — you might be surprised, and you’ve shown them that some people don’t subscribe to the stereotype.

    (The stereotype is that ‘girl coders are weird’, or that ‘only boys can be coders.’ The statistical fact is that ‘most girls don’t code’. I hope we can agree that those are different statements).

    As to what’s on the brain of twelve year old boys — have you had the pleasure of parenting one recently? The priority of doing an elderly relative a favor in exchange for soft drinks is extremely low. The thing about impressing girls is an exaggeration, of course, impressing their friends is just as important, but it’s all about the peer group — and that goes for the girls as well, though they go about it in other ways. The adult world is pretty irrelevant at that stage.

  22. if you don’t think that having things like coding be heavily gendered is a good thing

    I don’t know, is it a bad thing? Interest in coding will lead you to maybe a programmer’s job, which is a nice middle-class job, but not something really outstanding. I mean, if there are girls/women who have interest in coding, nobody should mind them being in the game. If they are somewhat disadvantaged because stereotypes are stacked against them then the high-minded society can help. But overall the dearth of women coders does not strike me like something obviously wrong.

    P.S. I don’t have much of experience here, but I don’t think asking a random 12-year-old boy or girl to set-up a website will lead to anything reasonable. Playing videogames is not enough to build the skilz.

  23. Hat, just to be absolutely clear, I’m not at all blaming you, and I recognize that you have long been forceful and unabashed about highlighting the contributions of women, especially ones made against societal pressure. As I said, I don’t know what can be done about encouraging anyone in particular to read or comment here. All I meant was that it’s too bad about the imbalance, and I wish it weren’t so.

    The comment exchange in this particular blog entry is a rare case indeed of this sort of thing. I don’t think your ‘nephew’ comment was to blame by itself. But once Lars’s comment followed, the comments became a conversation, and the line connecting these two comments aims in a tasteless direction.

    I can’t see judging Claire for being offended/insulted/piqued/what-have-you, but I will say that would have phrased the complaint differently.

  24. It was surprising to me that, in a thread about a database on Australian languages, the discussion was a bunch of loaded assumptions about who does coding. My comment was directed at the commenters, not at Steve.

  25. ə de vivre says:

    Hat:
    For the record, I’m not a woman. I thought I’d disambiguated that when I’d said “us men”, but I realize now I was not as clear as I thought I was. Only speaking for myself, I think I was assuming that the emotional stakes here were much lower than other people were taking them to be. I thought your comment was pretty anodyne, and I only got involved because I figured everyone involved was more or less already on the same page and we were just talking past each other a little. As you say, a mild suggestion of better phrasing was all I was going for, with some gentle ribbing too I guess, but it’s hard to convey tone on the internet and I’m not always the best at judging how tense things are. So I retract what I said, and consider yourself mildly suggested.

    Let’s all try and be better (myself included) and smash some fuckin’ patriarchy, eh?

  26. When the good fight amongst themselves, evil laughs all the way to the bank.

    There definitely are institutional barriers against women in programming, or more precisely against anyone but young(ish) white or East Asian males. They enter classes at a high rate, but disappear before running the whole gauntlet. My non-statistical impression is that this actually got better for a while, and then got worse again with the rise of startup culture, which uses “not a cultural fit” as code for “too black/female/Indian/old to be acceptable”. I just got turned down for a job for the last of these (these reasons are illegal in the US, but nobody can say anything if a company thinks “you won’t be happy with us), so I’m even more annoyed with it all than usual at the moment.

    It’s getting to the point where I’m going to start asking prospective employers about their gender balance in tech jobs, and if they don’t have any women (my most-recently-previous employer had none), I’ll want to know what they are going to do about it. If anyone hears of any jobs, please let me know.

  27. marie-lucie says:

    Claire: Congratulations on getting this big project underway and completing the first phase. It is an enormous piece of work.

    It was surprising to me that, in a thread about a database on Australian languages, the discussion was a bunch of loaded assumptions about who does coding.

    Perhaps because few people here are linguists (or women!), let alone linguists familiar with Australian languages (I am a linguist, but my language specialty is elsewhere), so they picked on your comment and its potential implications rather than the news of the database, its contents and format. Social assumptions change slowly.

  28. ə de vivre says:

    Right on! Making more data about Australian languages available is a Big Deal! Learning about ergativity and the weirdness (from this L1 English speaker’s subjective point of view) of languages like Dyirbal was a big part of what made me fall in love with linguistics (much to the chagrin of my career prospects).

    startup culture, which uses “not a cultural fit” as code for “too black/female/Indian/old to be acceptable”

    No amount of higher education spent reading Gramsci, Althusser, and Spivak could radicalize me more than the year I spent working for Amazon.

  29. marie-lucie says:

    ə de vivre: Learning about ergativity and the weirdness … of languages like Dyirbal

    Ergativity is not limited to Australia. In Europe it is characteristic of Basque and some of the Caucasian languages, and there are a number of examples also in the Americas. Once you get used to it it doesn’t seem so weird!

  30. ə de vivre says:

    No argument from me about that. Dyirbal just happened to be my first introduction to ergativity (or any non nominative-accusative system), and since it’s one of the more robust examples of syntactically ergative languages it popped up a lot over the course of my studies. The idea that there were entirely new ways of expressing the basic concepts I still took for granted after studying a couple languages beyond my native one was a big revelation. My impression is that Australian languages have some areal features that are fairly uncommon elsewhere (not that there aren’t other under-documented areas that also have important things to say about Language), so if you’re trying to figure out how natural human languages do and do not differ it’s a really important set of data.

  31. The more information about Australian indigenous languages that is publicly accessible, the better. I’ve been pleasantly surprised coming back to Australia this time to find that things have gradually changed in the 20-odd years I’ve been away. There are now indigenous bodies around Australia fostering interest in indigenous cultures and languages, there are bodies dedicated to raising literacy and numeracy among indigenous peoples, in indigenous languages, and even the Prime Minister started a speech with a few sentences in an indigenous language in Parliament recently. Needless to say, Australian languages are by no means mainstream amongst ordinary Australians, but the changes are encouraging.

  32. David Marjanović says:

    When the good fight amongst themselves, evil laughs all the way to the bank.

    Rarely. Evil tends not to be a monolith either.

    Ergativity is not limited to Australia. In Europe it is characteristic of Basque and some of the Caucasian languages

    All of the Caucasian languages in the phylogenetic meaning of that term, AFAIK; beyond that, the neighboring Kartvelian languages are split-ergative (nominative/accusative in the present tense and ergative/absolutive in the past, IIRC), as are at least some Iranian languages (notably Kurdish).

  33. Not to mention Hindi and other Indic languages, which have an ergative pattern in preterite tenses (but only with full NPs rather than pronouns) and an accusative pattern elsewhere.

  34. David Marjanović says:

    Yes; I was exploring meanings of “Caucasian”.

  35. I had thought that the current orthodoxy is that Caucasian has no phylogenetic meaning: there is Kartvelian (formerly South Caucasian), Nakho-Dagestanian (formerly Northeast Caucasian), and Abkhazo-Adyghean (formerly Northwest Caucasian), all of them unrelated except by the dubious conjectures of long-range comparisonists. (The North Caucasian hypothesis: “Let’s do it like this: if you like hairy noun cases, walk east, and if you prefer complicated verbs, peregrinate westward. That way we can stop all this pointless infighting.”) The whole place is a relic area, and it may well be that relatives of any or all of these families were once spoken outside the Caucasus too, just as Indo-European, Turkic, Semitic, and Mongolic all have Caucasian and non-Caucasian representatives.

  36. David Marjanović says:

    I had thought that the current orthodoxy is that Caucasian has no phylogenetic meaning:

    I may have silently exaggerated the support outside the “Moscow School” for the hypothesis that (North)east and (North)west Caucasian are sister-groups. However, everyone seems to agree these days that Kartvelian is not at all closely related to (either of) these.

    Whether there’s a Dag(h)estanian branch exclusive of Nakh seems to be unclear still; the alternative is that Nakh is nested inside it.

    and if you prefer complicated verbs, peregrinate westward

    While perhaps less extreme, East Caucasian verbs are pretty impressive, too.

    The whole place is a relic area, and it may well be that relatives of any or all of these families were once spoken outside the Caucasus too

    Absolutely.

  37. Indeed, there is another idea (the “Pontic hypothesis”) that the sister group of Abkhazo-Adyghean is Indo-European, but it seems based on little more than typology (as indeed does the North Caucasian hypothesis).

    Whether there’s a Dag(h)estanian

    Typo: my bad.

    branch exclusive of Nakh seems to be unclear still

    Yes, well, IE is not divided into an Indic branch and a European branch. Names are but names.

  38. David Marjanović says:

    Indeed, there is another idea (the “Pontic hypothesis”) that the sister group of Abkhazo-Adyghean is Indo-European, but it seems based on little more than typology

    It is also based on common vocabulary, including a few grammatical elements, that can apparently all be explained as loans.

    as indeed does the North Caucasian hypothesis

    No, that one is based on regular sound correspondences; indeed, Proto-West-Caucasian can almost entirely be derived from Proto-East-Caucasian as reconstructed by the same people. The downsides are: 1) many of the PWC innovations are losses, as the frontness, roundedness and even length of vowels was transferred to the preceding consonants, grammatical affixes were lost and the roots were generally shortened, so PWC zero corresponds to several different things in PEC; 2) the PEC reconstruction can certainly still be improved, because many extant EC languages – often endangered one-village languages, but even literary Chechen – are insufficiently described. In particular, several have phonemic tone, the origins of which cannot be reconstructed due to simple lack of information.

    Typo: my bad.

    No, I just meant that both spellings exist. 🙂

  39. Johanna Nichols thinks that the NC dictionary suffers badly from confirmation bias and the failure to work up Nakh-Daghhhhestanian carefully first.

  40. ə de vivre says:

    Re: Dag(h)istan
    Going through the spellings in the local(-ish) languages, looks like the ‘g(h)’ runs the gamut from stop to fricative to glide to zero, between velar and uvular. Spell it however you want, you’ll be equally right and wrong.

  41. David Marjanović says:

    Johanna Nichols thinks that the NC dictionary suffers badly from confirmation bias and the failure to work up Nakh-Daghhhhestanian carefully first.

    Possible, though I’m not aware of any more detailed comments from her or anyone. When the dictionary came out, she was even against recognizing Nakh-Daghestanian, but she accepts it nowadays based on her own work.

Speak Your Mind

*