Grammars Across Time Analyzed.

Via Ionuț Zamfir’s Facebook post, Blum, F., Barrientos, C., Ingunza, A. et al., “Grammars Across Time Analyzed (GATA): A dataset of 52 languages” [Sci Data 10, 835 (2023)]; the abstract:

Grammars Across Time Analyzed (GATA) is a resource capturing two snapshots of the grammatical structure of a diverse range of languages separated in time, aimed at furthering research on historical linguistics, language evolution, and cultural change. GATA comprises grammatical information on 52 diverse languages across all continents, featuring morphological, syntactic, and phonological information based on published grammars of the same language at two different time points. Here we introduce the coding scheme and design features of GATA, and we describe some salient patterns related to language change and the coverage of grammatical descriptions over time.

It’s open access, so you can read the whole thing; if you have thoughts, please share. (And if you’re curious, Ionuț is the Romanian diminutive of Ion; the English equivalent is Johnny.)

By the way, a public service announcement: Huddie “Lead Belly” Ledbetter’s given name is properly pronounced /ˈhjuːdi/; the 1900 Census lists him as “Hudy Ledbetter,” and if he’d kept that spelling it would have been easier on everyone. I mention it because I keep forgetting it, and was reminded once again today. So now you know, if you didn’t already.

Comments

  1. J.W. Brewer says

    I can understand why the “Huddie” spelling cues /hʌ/ rather than /hju/ in the first syllable, but I’m not sure that the “Hudy” spelling affirmatively cues the /hju/ option, not least because of the /hu/ third option. Is there some predictable pattern of the distribution of /hu/ v. /hju/ in AmEng that I probably know subconsciously but that this spelling is failing to trigger?

  2. David Eddyshaw says

    The illustrations in the introductory material suggest that the focus in on the different approaches to the writing of grammars over time rather than actual diachronic changes in grammar; I suppose that is what it says on the tin, but it’s a bit disappointing.

  3. J.W. Brewer says

    Okay, wikipedia says helpfully “The diphthongs /juː/ or /ɪʊ̯/ are most commonly indicated by the spellings eu, ew, uCV (where C is any consonant and V is any vowel), ue and ui, as in feud, few, mute, cue and suit, while the historical monophthong /uː/ is commonly indicated by the spellings oo and ou, as in moon and soup. So I guess this is “uCV,” but on the other hand if you think “looks like it rhymes with ‘Judy’ and ‘Rudy'” and don’t subconsciously back out the yod-dropping in those two names you will be led astray in this instance. “Like ‘Huey’ but insert a /d/” is not an approach that sprang to mind initially.

  4. It may help to remember it in Pete Seeger’s voice: https://youtu.be/RLQbNRuD1ZE.

  5. I suppose that is what it says on the tin

    “the grammatical structure of a diverse range of languages separated in time” is about language and not its description.

    (We can, of course, claim that all structures only exist in our minds (the universe itself being just “stuff” which we describe as a function in a space, and when we look at this function in space we say that this arbitrary region is “sun”) but we at least see same structures)

    and we describe some salient patterns related to language change and the coverage of grammatical descriptions over time.” – both.

  6. Hypothesis:
    Grapheme u may be /ʌ/ only ever in old English words and /ju/ only ever in anglicised words. In more exotic borrowings it must always be /u/

    Anglicised:
    Hugo, Hubert, humour, human, humerus, humid, humus, huge

    Exotic:
    Hutu, hula, Huberman, Hunan

    Special pleading::

    Huron – early explorers spellings are anglicised, later ones exotic.

    Hudibras – I have been saying /hu/, now I have doubts. .

    Huitt – a variant of Hewitt. Surnames are wack.

    hubris – /hju/. I have no doubts, just questions

  7. Huberman with only one n I’d say has a decent shot at having /hju/. Certainly my American violin teacher Mrs Huber had /hju/.

  8. Hummus (foreign) and Hutterite (anglicised), with ʌ’s, are unaccounted for, on account of the geminates.

  9. Just to be clear, I’m not saying that the “Hudy” spelling unambiguously indicates /ˈhjuːdi/, which would be absurd; I’m saying that that spelling is compatible with the pronunciation, so that if you have seen the one and heard the other you can keep them connected in your head. “Huddie,” on the other hand, is completely incompatible with that pronunciation, which consequently is easily forgotten.

  10. Somewhat relatedly, I’ve just heard a human audiobook reader pronounce demur as demure.

  11. PlasticPaddy says

    I am actually surprised I have never heard this pronunciation with “concur”, where concure would give a curative aspect to the concurrence.

  12. Hurons appeared in a poem by Lomonosov where he mocked the proposal to assign [ɦ~ɣ] to г in some words and [g] in others (the idea was that no one will be sure in what word what sound should be used).

  13. demur as demure

    I’ve heard that from more than one audiobook reader unfortunately.

  14. Stu Clayton says

    By “audiobook reader” do you mean a person or a voicebot? I know audiobooks only as recordings of persons reading (a printed form of) books out loud. Has this “voice software” made inroads here as well ?

    This week I was about to read a short article in the WaPo, then saw there was a “listen” button so I pressed that – something I’d not done before. A lady voice, of that chirpy-confiding radio presenter kind, started up.

    It quickly became clear that this was a phoney chirpy-confiding voicebot reading out the article. The intonation was slightly robotic in many places, and there was one unnaturally long pause between words, at which point I turned the voice off in annoyance. Cumulative annoyance, because none of the infelicities by itself was anything to shake a cane at.

    Checking the article, I saw that the pause occurred before the noun of a longish noun phrase.

    I give ESL speakers a break, since they try – or at least did way back in the past – to speak correctly. But I stick with technical podcasts by Indians/Pakistani (there are thousands of such podcasts) only when the content is highly interesting, and well-presented instead of being friendly, bubbly and repetitive.

    It mentally exhausts me to be hacking through an accent jungle while trying to keep track of the ocelot.

    Last not least: what kinds of books contain the word “demur” ? If it were “demure”, I would know they were written by Dame Edna or Rosamunde Pilcher.

  15. Keith Ivey says

    They were human readers. I haven’t listened to any audiobooks with bot readers, though I imagine those will eventually achieve acceptable quality.

  16. J.W. Brewer says

    mollymooly’s hypothesis seems plausible, but does not resolve the difficulty of classifying “Hudy” between the “good old English” and “anglicized many centuries ago” boxes. Not least because it feels like a hypocoristic from some other name beginning Hud-, but what would that be?

  17. J.W. Brewer says

    Maybe this is all just a set-up for the Big Reveal that the first syllable of “Leadbelly” should actually be pronounced as /lid/ rather than /lɛd/, i.e. FLEECE rather than DRESS. I mean, both are compatible with the spelling …

  18. January First-of-May says

    Not least because it feels like a hypocoristic from some other name beginning Hud-, but what would that be?

    Hudson? Hudibras? Huddersfield?

    …I checked some name websites and the only remotely good option listed there is Hudson (which has /ʌ/, of course, as a good old English word).

  19. Huddie “Lead Belly” Ledbetter’s given name is properly pronounced /ˈhjuːdi/

    That’s interesting because “Hiúdaí” is also an Irish name, a familiar form of “Hugh” or “Aodh”. I wonder whether the name somehow percolated into the US South with Irish immigration.

    I heard a story about Leadbelly, that he had a long discussion once with an Irish singer (can’t remember who) about Irish folk song, and the song “An Droimeann Donn Dílis” was mentioned. This is a song about a cow. Leadbelly was very intrigued that anyone would write a song about a cow, and he set out to do his own version.

    Unfortunately, Pete Seeger heard Leadbelly’s song, stole it, and turned it into “Kisses Sweeter than Wine”. So we will never get to hear Leadbelly’s original version of “An Droimeann Donn Dílis”.

  20. January First-of-May says

    So we will never get to hear Leadbelly’s original version of “An Droimeann Donn Dílis”.

    …it was recorded, apparently; here’s a rendering on YouTube.
    It is, in fact, (mostly) about a cow, though reportedly the Irish original had a completely different plot.

  21. John Cowan says

    Not least because it feels like a hypocoristic from some other name beginning Hud-, but what would that be?

    Per Wikt, it is a hypocoristic of Hudde, a surname of unknown origin. The same article says it can also be an anglicization of Irish O hUada; alas, Uada is itself of unknown origin. Ignotus per ignotium, or, mysteries on mysteries.

  22. Stu Clayton says

    Ignotus per ignotium

    recte ignotum per ignotius.

    ignotum per aeque ignotum describes how nominalists explain things. It is less adventurous than the other way.

    Both are rhetorical procedures, not methods of proof.

  23. From WP: “Another example would be referencing Rayleigh scattering as an explanation for why the sky is blue, when a more apt explanation would be simply that air is blue.”

    No, if the sun appeared blue and we could see stars during daytime, even then “air is blue” would be just tautology, not an explanation…
    As is it mostly equates to “you know, there apart of sky/athmosphere/air/what-d’you-call-it are other objects that appear bluish for a similar reason, compare Tyndall effect“.

  24. January First-of-May says

    My meta-title-text for the corresponding comic

    “This is actually an unrelated effect: yes, air is blue, and this is why far-off mountains look blue, but there’s so little air in the direction of the sky that the color produced by Rayleigh scattering takes over. (Incidentally, air is blue because oxygen is blue. Nitrogen is colorless.)”

    [AFAIK the causal part of the statement in brackets is in fact wrong. Yes, that was intentional.]

  25. PlasticPaddy says

    @JFOM
    It seems to be a Jacobite song (or an older song repurposed). There is a version in the collection “An Duanaire: Poems of the Dispossessed”. Here is another (cut-and-pasteable) version with my translation:

    “A dhroimeann dubh dílis, a scoith shíoda na mbó,
    cá bhfuil do mhuintir nó an maireann siad beo?”
    “Tá siad sna díogaibh sínte faoin bhfód,
    ag súil le Rí Séamus a théacht insa gcoróin.”

    Dá bhfaighinnse cead aoibhnis(1) nó radharc ar an gcoróin,
    thriallfainn go Sacsain d’oíche is de ló,
    ag siúl bogaigh is curraigh is sléibhte dubha ceo,
    nó go seinnfear ar dhromaibh “An Droimeann Dubh Ó”!

    “Dia do bheatha don mbaile, a dhruimeann dubh ó!
    ba mhaith do chuid bainne is ba mhilis le hól,
    do chaoinfinn do leaca is do chom cailce mar rós,
    do mhalairt ní dhéanfad, a dhroimeann dubh ó!”

    “O dear Droimeann Dubh, silky peeress of cows,
    Where are your people, or do they still live?”
    “They are in trenches stretched out under the soil,
    Waiting for King James to come in to his crown.”

    If I got permission to speak, or a glimpse of the crown,
    I would proceed towards England, night and day,
    Walking the bogs and the rough patches and the mountains black with fog.
    Until one would play “An Droimeann Dubh Ó” on drums (or [English] backs?)

    A blessing from home, O Droimeann Dubh!
    Your milk was the best and the sweetest to drink,
    I mourn your rosy cheeks and white waist [OK, it’s just a cow],
    They will not replace you, O Droimeann Dubh!

    (1) aoibhnis “bliss” should be aighnis “speech/argument”, as in the version from An Duanaire, so permission to speak, not ?permission to feel bliss?

  26. We discussed the blue of the firmament and the atmosphere extensively here.

  27. @mollymooly, who said “Grapheme u may be /ʌ/ only ever in old English words and /ju/ only ever in anglicised words. In more exotic borrowings it must always be /u/”.

    At least in my British English, /ʌ/ can be used in foreign names like Humboldt, and /ʌ/ is the correct vowel in the first syllable of Punjab, and there are words not derived from old English and (which I take to be your meaning) not present in pre-modern English, like jumbo and rucksack, which have /ʌ/. Some people pronounce rucksack ‘rooksack’, that is true; I think it’s quite a recent borrowing from German and was earlier on less Anglicised.

  28. Where Punjab originally has a (or ə). (I know, everyone knows that.)

  29. Where Punjab originally has a (or ə).

    This is excessively unclear. You mean Punjab is written Panjab (with a) in a technical transcription, but that first a is pronounced /ə/ and therefore was written with u in the traditional Anglicized spelling, since the letter u suggested the correct sound (as in the word “pun”). Unfortunately, modern people (I want to say “idiots,” but that’s not nice) who are intent on trying to reproduce “correct” pronunciations (i.e., those of the people from whom a word or name was borrowed) and are convinced that the written letter u should always mean /u/ wind up using the absurd pronunciation “Poon-jahb.”

  30. David Marjanović says

    Aodh

    Is that pronounced [eː]?

    wind up using the absurd pronunciation

    That also happens when English spellings spread beyond English, as with Kalkutta and Burma ~ Birma in German, and of course la Birmanie in French…

    Conversely, Laos: that’s les Laos, plural of Lao with the silent plural marker of Written French.

  31. @Languagehat, yes.
    I honestly don’t know what is the range of pronunciations and from what exact source English speakers borrowed it from. I think [a] is possible in Indic languages here.
    This is why I wrote it in such a sloppy way.

    “Poon-jahb.”

    Well, “poon” does look like something Indian.

    Wiktionary infroms me that in English it means “any of several East Indian trees of the genus Calophyllum, yielding a light, hard wood used for masts, spars, etc.” (Origin: 1690–1700; compare Tamil புன்னை (puṉṉai), Malayalam പുന്ന (punna), names for Calophyllum inophyllum. Doublet of punnai.) and also “vulgar slang” for “poontang”, the latter means 1. vagina 2. sexual intercourse (Origin uncertain. Likely from French putain (“prostitute”).[1][2] Compare Jamaican Creole punani (“female genitalia”); also Tagalog putang ina (“contemptible person”, literally “prostitute-mother”) from Spanish puta (“prostitute”))

  32. I think [a] is possible in Indic languages here.

    For short a? Not in the ones I know, and certainly not in Hindi, from which the word was borrowed.

  33. David Eddyshaw says

    You mean Punjab is written Panjab (with a) in a technical transcription, but that first a is pronounced /ə/

    Indeed: as Pāṇini Himself remarks: a a.

  34. LH, well, then I was warong* (I was not sure anyway). Thanks.

    *Count it as an epenthetic typo and read it as you please:)

    PS. perhaps what confused me is this (wt for the English word):

    “(South Asia) IPA: /pə̃.ˈdʒaːb/, /pan.ˈdʒaːb/”

  35. David Marjanović : I thought Kolkata for Calcutta was a folk etymology forced by the BJP for religious reasons, and the original etymology is not known?

  36. Foreign names in Modern Hebrew are often torn between phonetic and orthographic faithfulness. Lincoln got it bad: לינקן lost and לִינְקוֹלְן won, and the name is pronounced /ˈlinkoln/ or, by the less deft, /ˈlinkolen/.

  37. David Marjanović says

    I thought Kolkata for Calcutta was a folk etymology forced by the BJP for religious reasons, and the original etymology is not known?

    Oh, I didn’t know any of that. But surely the original wasn’t pronounced with [a] and [ʊ]?

  38. January First-of-May says

    I thought Kolkata for Calcutta was a folk etymology forced by the BJP for religious reasons, and the original etymology is not known?

    AFAICT (from Wikipedia), the British colony of Calcutta was established in 1690 in or near a (scantily attested) pre-existing local village named Kôlikata (or similar – spellings differ even within Wikipedia), and got its name from said village.

    I’m not aware of any specific etymology proposed for the “Kolkata” spelling; Wikipedia claims that the Bengali name was continuous throughout. There’s apparently a lot of proposed etymologies for the original village name, none of which sound particularly plausible to me (but I don’t know remotely enough about the relevant languages).

    and the name is pronounced /ˈlinkoln/ or, by the less deft, /ˈlinkolen/.

    Previously on LH (the entire thread might be of some relevance).

  39. drasvi, were you thinking of Pune, by any chance?

  40. @Y, I think no, I did not meant specifically Pune/Poona…

    Rather prolliferation of “oo” in Indian spellings, which unexpectedly makes LH’s transcription of hypercorrect pronunciation of Punjab look soo (falsely) Indian.

  41. January First-of-May : Apparently Mody decided that Kolkata sounded like a Hindu term.

    “the British colony of Calcutta was established in 1690 in or near a (scantily attested) pre-existing local village named Kôlikata (or similar – spellings differ even within Wikipedia), and got its name from said village.”

    That’s how the city got its name, correct, but apparently the BJP decided it had to sound more religiously Hindu.

Speak Your Mind

*