Written Language Quiz.

From Anotherquiz.com: Can You Identify 11 Languages By Their Writing? They call it “Our hardest trivia yet!” but I found it ridiculously easy; I got 11 of 11, and there were only a couple about which I felt even a momentary doubt, enough to make me take a closer look before hitting the button. It seemed like they weren’t even trying to make it hard; on a couple they could have made me sweat a little if they’d chosen languages that use similar-looking systems instead of just random languages from around the globe. And some of the language samples themselves are weird in ways I won’t specify because I don’t want to give away answers, but you’ll see what I mean if you know any of the languages. Kvetch, kvetch, kvetch! But it’s fun anyway, all language quizzes are fun, so go ahead and give it a try (obviously if you haven’t spent a lot of time splashing around in foreign languages you may find it more challenging), and don’t go into the comment thread if you don’t want spoilers, because I expect people will be discussing the samples and their results.

Comments

  1. The Hebrew and Arabic… YIKES!

    Hebrew: someone, a person or a computer, translated “How do you do today” to Hebrew איך אתה עושה היום, which would literally mean “How do you do today”: this would be merely opaque nonsense, except it’s also ungrammatical, since עשה ‘do’ is transitive (except in some slang expressions, where it means something like ‘do it’ in English). And then, as a bonus, the letters are set from left to right, with the finals ך and ם on the right end of the word.

    Arabic: I hardly know any Arabic. I think they meant to write كيف حالك اليوم (which I think is correct) but used the non-combining forms for all the letters, and again set them from left to right.

  2. George Grady says:

    That was easy. I wouldn’t have guessed the Bulgarian one in isolation, but it was obvious by process of elimination.

    What were the marks above the я in сегодня in the Russian sample?

  3. Oh, and when you’re done, you get a box urging you to get attached to them via Facebook, by the irresistible caption Cras erat arcu, cursus et sodales nec, tincidunt ac augue. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas..

  4. I wouldn’t have gotten the Bulgarian on its own either, and even with the choices I thought a split second about Mongolian. Everything else was pretty easy, some insanely so.

  5. Hmmm. It didn’t tell me how many I got right, but I suspect even I guessed them all. Bulgarian I wouldn’t have gotten in isolation, but with the options it was easy.

  6. . . .pretty easy, some insanely so.

    My sediments exactly. Couldn’t say I ‘liked’ it.

  7. Jongseong Park says:

    George Grady, it looks like they missed deleting some devanagari marks from the previous sample after their copy-paste job. It’s a bit sloppy even for a quick online quiz.

    The Korean sample is 오늘 어때요, which is literally “how is today”. It could mean “how about today?”, or it could mean “how are you/how is [some other understood subject] today?”. You would normally expect a question mark. It makes me wonder what procedure was followed to generate the samples, which seem to be variations of “how do you do today” at first glance, though I’m not sure about the Japanese where I only recognize the kanji for machine and don’t see the expected kanji for “today”.

  8. I really expected there’d at least be some Georgian or Sinhala or Cherokee or something.

  9. John Emerson says:

    Oddly, I guessed Mongolian instead of Bulgarian, because I knew that Mongol was written in several different scripts and didn’t know that about Bulgarian. So my very small knowledge of Mongol Cyrillic betrayed me.

  10. Mongolian Cyrillic can be distinguished by use of two vowels absent in Russian or Bulgarian – Өө (ö) and Үү (ü) – and by frequent duplication of letters to represent long vowels – like in өнөөдөр (önöödör).

    Unfortunately, letters Өө and Үү are also used in several dozen other languages spoken in former Soviet Union (along with many more additional letters).

    Kazakh, for example, has them too.

  11. Bulgarian is written entirely in Cyrillic. If you don’t know any Slavic languages, best hint would be frequent use of letter ъ to represent unstressed vowels.

  12. I think it would be very easy to make the quiz much harder if there was a choice between Hindi and Marathi or Bulgarian and Russian (though even Mongolian-Bulgarian choice seems too hard for many).

  13. I’d like it better if it was designed by and for people who like language and think about it a lot (like this noble company), rather than by some hacks at an address-harvesting click-bait operation.

  14. The Japanese was correct but a bit old-fashioned. Perhaps they were influenced by the recent drama series on NHK about the translator of Anne of Green Gables? In this series a similar phrase was used a lot.

  15. @Jongseong Park

    It was ご機嫌はいかがですか
    きげん4【機嫌】 ローマ(kigen)
    a mood; a humor; spirits; temper. [⇒ごきげん]
    ごきげん【御機嫌】 ローマ(gokigen)
    ご機嫌いかがですか. How are you (getting along)? | 〔病人に〕 How ┌do you feel [are you feeling] today?

  16. I really expected there’d at least be some Georgian or Sinhala or Cherokee or something.

    Yup!

    I think it would be very easy to make the quiz much harder if there was a choice between Hindi and Marathi or Bulgarian and Russian (though even Mongolian-Bulgarian choice seems too hard for many).

    Indeed!

    I’d like it better if it was designed by and for people who like language and think about it a lot (like this noble company), rather than by some hacks at an address-harvesting click-bait operation.

    Exactly!

    Nice to get such a thorough sense of vindication; y’all reacted just the way I did. And I’m glad to get the details of what was wrong with some of the samples where I knew the alphabets but wasn’t able to tell exactly how screwed-up the sample was.

  17. Trond Engen says:

    I decided most of them in a glance before looking at the alternatives, but I wasn’t sure it was Thai rather than Lao and Hindi rather than Marathi. Also, I sort of expected it to increase in difficulty to some arcane minority languages written in, say, Arabic script. The alternatives removed all doubt, except that I too paused for a split second to be sure that Bulgarian wasn’t Cyrillic Mongolian.

  18. I wish somebody would do a better version; it’s a great idea.

  19. Lao script is more rounded than Thai. (there is also a popular font in Thailand which is even more squared and makes Thai to look like Latin or Greek.)

  20. Also, large chunks of Japanese text could be written entirely in kanji and to distinguish it from traditional Chinese is not very easy for someone who can’t read Japanese or Chinese.

  21. And it could be a quite a challenge to tell Persian from Arabic or Urdu from Persian.

  22. Athel Cornish-Bowden says:

    Absurdly easy! I hardly had to think about any of them apart from Somali/Korean, and in that case I made a foolish analysis and came up with Somali. I would guess that Somali uses Arabic script (or conceivably Amharic). Although this was clearly not Arabic or Amharic I thought it looked a bit crude for Korean, so I guessed it might be an archaic way of writing Somali. None of the others needed more than a glance, though I thought the Bulgarian might be Russian until I saw that Russian wasn’t offered as a possibility. I might have difficulty distinguishing between those two in a very short text, unless the letter ъ occurred frequently, whereas it’s very rare in modern Russian. Even older Russian shouldn’t create a problem with this as it’s my impression that although ъ was very common at the ends of words it hardly ever occurred anywhere else (is that right?). If this was their most difficult quiz I wondered what their easy ones might be like.

  23. absurdly easy, especially considering the prompting choice. I only hesitated between Chinese/Japanese because it looked like they used old Chinese characters, long abridged.

    And a Happy New Year!

  24. The title is “Can You Identify 11 Languages By Their Writing?”, so I guess it’s not unreasonable that it doesn’t require any knowledge of the languages themselves. It’s a test of your ability to recognize common writing systems plus your knowledge of what writing system is normally used for some common languages. I wonder if the inclusion of Mongolian as an alternative in the Bulgarian question was just a mistake made by someone who didn’t realize Mongolian was written in Cyrillic.

  25. By the way, where does the h in Amharic come from? It’s amarıñña in the language in question.

  26. Jongseong Park says:

    @juha:
    Thanks for the explanations on the Japanese sample. I think Amharic derives from Amhara, which in Ge’ez is አምሐራ ʾÄməḥära (አማራ Āmara in Amharic).

  27. Somali was written in Arabic script for centuries, but the dominant script nowadays is Latin. A number of Somali-specific scripts were devised in the 20C and still see varying degrees of use.

  28. Maybe we should make a harder quiz ourselves?

    How about this one.

    “Jeg kommer fra Norge” – this sentence is written in

    a) Danish
    b) Norwegian
    c) Swedish

    Which sentence is written in Danish?

    a) Det var en fuktig, grå sommardag i slutet av juni.
    b) Det var en fuktig, grå sommerdag i slutten av juni.
    c) Det var en fugtig, grå sommerdag i slutningen af juni.

  29. @Jongseong Park
    Thanks a lot!

  30. What were the marks above the я in сегодня in the Russian sample?

    яैं – not sure if that’s rendered correctly though.

    I think it’s DEVANAGARI VOWEL SIGN AI and DEVANAGARI SIGN ANUSVARA added by accident. Hindi sample ends with them too.

  31. Ha! Boy, they sure worked hard to get such a lousy result. You’d think finding sample sentences would be an easy matter.

  32. Which sentence is written in Danish?

    I have no idea, but it’s just obvious that they all mean “That was a fuckup, gray summer days and the sluts of June.”

  33. Melchior AJP Anderegg says:

    c is dansk.

  34. They don’t have sluts in Denmark, they have slutnings.

  35. Match the texts (all different, all from Wikipedia) with the languages (all Romance):

    1. Su trenu ’e Casteddu arribàda a mesudì. Ma de candu intràda in is furriàdas de Tacchenurri e ancora no si bìat, s’intendìat su ciuff ciuff e su fragu ’e su fumu, màssima candu tiràt bentu estu.
    2. Tutti i ggìenti nascianu libberi e ‘gguali all’àtri ppì ddignità e diritti. Ognunu tena cirbìeddru raggiune e cuscìenza e s’ha de cumbortà cull’atri cumu si li fòssaru frati.
    3. La gualp eara puspe egn’eada fumantada. Qua â ella vieu sen egn pegn egn corv ca taneva egn toc caschiel ainten sieus pecel.
    4. Sei bedda chi dugna cori / s’innammurigghja di te / pa l’occhj mei un fiori / ed è la meddu chi c’è.
    5. Tuota nuester, che te sante intel sil, sait santificuot el naun to. Vigna el raigno to. Sait fuot la voluntuot toa, coisa in sil, coisa in tiara.

    a) Sutsilvan Romansh
    b) Campidanese Sardinian
    c) Gallurese Corsican
    d) Dalmatian
    e) Cosentino Calabrian

  36. David Marjanović says:

    letter ъ to represent unstressed vowels

    It’s a full-fledged phoneme that occurs in stressed and unstressed positions like the other five vowel phonemes.

    “Jeg kommer fra Norge” – this sentence is written in

    Not Nynorsk, where it’s Noreg rather than Norge. But I’m not sure about the other three options.

    Which sentence is written in Danish?

    c, because it has -gt- instead of -kt-.

    Match the texts (all different, all from Wikipedia) with the languages (all Romance):

    1) Sardinian. Su trenu can’t be anything else.
    3) Romansh. Lots of words ending in a consonant, general weirdness, vowel clusters, and the sch doesn’t exactly hurt. 🙂
    5) Vegliot (northern “Dalmatian”), because of the unique first word, the extreme change to Latin /aː/, the unique to, general weirdness that doesn’t seem Romansch, and… I’ve read the Wikipedia article a few times. :-]

    That leaves Corsican and Calabrian. 2) seems less close to Standard Italian, showing the southern u for Standard o, so that’s got to be Calabrian.

  37. Thank goodness somebody upheld the honor of the Hattery — I couldn’t have!

  38. Yes on all five—give the man a cigar!

    If I had to solve this, I’d have picked Vegliote first because it’s the only ‘old’ text; To make it harder I could have picked Christian texts for the other languages too. But I’d have a hard time telling Corsican and Sardinian apart.

  39. I have no idea what language the 2nd sentence is written in, but I can understand it easily.

    If it’s indeed Calabrian, then I will add Calabrian to list of languages in my CV….

  40. Trond Engen says:

    I wouldn’t weigh in to early on the Nordic quiz, but I’m pretty sure that both Danish and Norwegian Bokmål are valid answers. For Nynorsk, many write Norge these days, some even choose the new weak present kommer for the strong kjem and fra for frå, but Eg rather than Jeg is a lithmus test.

    (In really archaic Eastern Nynorsk, however, one might use jeg, but that would be accompanied by very archaic forms elsewhere too: Jeg kjem fraa Norig or something like that.)

    Swedish has Jag and från.

  41. You mean Norwegians can’t tell their own language from Danish?

  42. Trond Engen says:

    Oh, we can, but for historical reasons Bokmål is quite close to Danish, and it just happens that none of the tells are present in your short example sentence (1).

    Your second question is easy, but it would have been harder without the alternatives. I’m quite sure I wouldn’t remember that Danish preferred slutningen for slutten. I’m not sure I’d remember that about Swedish slutet either. But I’d still recognize Swedish because of the retained a in unstressed position in sommar (2).

    1) I have a feeling that Norwegians are more likely than Danes to choose the phrase Jeg kommer fra, but that doesn’t make it unidiomatic in Danish.

    2) Nynorsk shares this, but also retains diphtongs: Det var ein varm, fuktig sommardag i slutten av juni.

  43. I once met an Ukrainian who spoke Russian, but didn’t know it was Russian and mistakenly believed he was speaking Ukrainian.

    Norwegians are not at this level of confusion yet, but getting closely enough…

  44. I don’t know what I’d say instead of “Jeg kommer fra Norge.” Perfectly idiomatic to me. I guess I “Jeg er fra Norge” would work too.

  45. They don’t have sluts in Denmark, they have slutnings.

    We have both. Slutnings are juvenile sluts. They grow up so fast.

  46. John Emerson says:

    Here’s a very familiar text in an obscure romance language.

    Tată a nostru, ți eșci tu țerl,
    s’ayisească numa a Ta,
    s’yină amirăriľa a Ta,
    si facă vreare a Ta,
    cum tu țerl, ași ș’pisti locl.
    Pânea a nostră ațea di cathi dzuă dă-nă-u ș’ază
    și ľartă-nă amărtiile a noastre
    ași cum ľi ľirtăm ș’noi a amărtoșlor a noșci.

    The church I was raised in had a schism around 1920-1930 which I think included a switch from Bokmal to Nynorsk, or maybe a refusal to switch. No one alive to ask about it, and the records are sketchy. Around 1948 they cancelled the Norse service and went to English-only, after a considerable bilingual period.

  47. The use of the letter ľ probably indicates that this Pater Noster is in Vlach/Aromanian..

  48. I wonder if Tata is related to Tuota in Dalmatian.

    Is it a Slavic borrowing?

  49. AJP Eggedosis says:

    You mean Norwegians can’t tell their own language from Danish? …Norwegians are not at this level of confusion yet, but getting closely enough.

    Haha. No. There’s nothing to be confused about. Bokmål + written Danish are very similar, but there’s never going to be the faintest doubt about which language someone’s speaking.

  50. David Marjanović says:

    I’d have a hard time telling Corsican and Sardinian apart

    They’re very different. Sardinian is the sister-group to all other Romance languages together, with su and sa as definite articles, and with no changes to the Latin vowel system except the loss of length. Corsican is very close to Tuscan and thus to Standard Italian; this is slightly obscured here by spelling out the radoppiamento (s)sintattico, the perfectly standard lengthening of word-initial consonants when the previous word ends in a vowel, which produces Standard de + la = della, in + la = nella, a + la = alla, e + pur = eppur and so on.

    I once met an Ukrainian who spoke Russian, but didn’t know it was Russian and mistakenly believed he was speaking Ukrainian.

    This blows my mind.

  51. Early 20th century censuses in Western Belarus discovered that majority of rural population didn’t know what nationality they were and when asked replied simply that they are “tutajszij” (from here) .

    Of course, they couldn’t name their own language either and simply called it “mowa” (language).

  52. never going to be the faintest doubt

    No, indeed. All you have to do is determine if the potato is in or out.

    su and sa as definite articles

    The only other Romance variety like this is Algherese Catalan, where there is a contrast between default ipse-based articles and specialized ille-based articles. In particular, la mort is ‘Death’ (abstract or personified), whereas sa mort is ‘the death (of which we were speaking)’. Since Alghero is on Sardinia, this isn’t too surprising.

    However, Gallurese and Sassarese, which are spoken in northern Sardinia, are Corsican/Tuscan with a massive Sardinian substrate/adstrate, and they are often called “Sardinian”, which confuses the issue.

  53. There are other dialects of Catalan that use es, sa.

  54. First: */felike annu novu/ to all.

    SFReader: you’re half-right: Dalmatian TUOTA and Romanian TATA are related, but this is no loanword, but an attested Latin term (TATTA) which was originally confined to children’s language.

    John Cowan: Rodger C. is quite correct, reflexes of IPSUM, IPSAM are used as the definite article in other dialects of Catalan, and there is at least one “dialect” of Italian which may still do so, or perhaps did so well into the twentieth century.

    John Cowan, David: making the whole thing even more confusing is the fact that Gallurese and Sassarese are much closer to Southern lects of Corsican spoken in Corsica itself than the latter are to lects of Corsican spoken further North. The internal diversity of Corsican is remarkable, has only been partly lost through the influence of Pisan Tuscan over the past thousand years or so, and has only been documented recently. It is quite ancient, and indeed in some ways Corsica is more diverse, linguistically, than the rest of Romance-speaking Europe combined. Really.

    This is quite a contrast to the lack of diversity of “core Sardinian” (Campidanese + Logudorese: both of these dialects can be derived quite unproblematically from the Old Sardinian language attested in Medieval texts and Charters). My own hunch (for whatever that is worth) is that this difference is due to Corsica having been linguistically romanized at a much earlier date than Sardinia (i.e. pre-Old Sardinian must have been spoken either outside Sardinia, or on a very small part of Sardinia, at a time when the dominant Romance variety of Corsica had already split into distinct forms ancestral to the present-day dialects.

  55. David Marjanović says:

    The internal diversity of Corsican is remarkable, has only been partly lost through the influence of Pisan Tuscan over the past thousand years or so, and has only been documented recently. It is quite ancient, and indeed in some ways Corsica is more diverse, linguistically, than the rest of Romance-speaking Europe combined. Really.

    That’s… not surprising, but I had no idea of it. Thank you!

    Corsica having been linguistically romanized at a much earlier date than Sardinia

    Also faster and… more thoroughly, I suppose. There are things (used to be in Wikipedia, can’t find them anymore) that Sardinian has in common with Basque and apparently nothing else, and on top of that Sardinian has things like the unique prefix ti- on some animal names of Latin origin… fascinating stuff.

  56. I had no idea of it. Thank you!

    Same here!

  57. — Sardinian has in common with Basque

    Sardinia and the Basque are descendants of the first wave of neolithic colonization from Asia Minor circa 6000 BC.

    The Basques even managed to keep their language intact

  58. indeed in some ways Corsica is more diverse, linguistically, than the rest of Romance-speaking Europe combined

    How fortunate we are, then, to know that the Romance languages did not arise on Corsica.

  59. David Marjanović says:

    Sardinia and the Basque are descendants of the first wave of neolithic colonization from Asia Minor circa 6000 BC.

    Yep.

    The Basques even managed to keep their language intact

    Looks like it (a thick layer of Latin and Romance loanwords excepted, and there may be Celtic ones, too).

    How fortunate we are, then, to know that the Romance languages did not arise on Corsica.

    True, and a very good point.

    There are cases like this in biology as well. People used to wonder for a long time whether parrots come from South America or from Australia, their two areas of greatest current diversity… in the last 20 years, the fossil record has spoken up and shown that parrots are fundamentally European.

  60. Classic quiz question for confused Slavic studies majors.

    Which of the following is the self-name of the Slovak language?

    a) slovenčina
    b) slovenščina
    c) slovienčina
    d) slovenština
    e) slovinčtina
    f) slovinština
    g) slovinščina
    h) slovaščina
    i) słowakšćina
    j) słowjenšćina
    k) slovanština
    l) słowiński
    m) słowiański
    o) słowacki
    p) słowiański
    q) słoweński
    r) slavenski
    s) slovački
    t) slovenački
    u) słowińska
    v) słowakska
    w) słowjańska
    x) słowjeńska
    y) słowińsczi
    z) słowacczi

    and two extra choices for the most erudite students

    27) słowiańsczi
    28) sloweńsczi

  61. fundamentally European

    Even if the oldest fossils are European, that’s not necessarily diagnostic either; that might be an artefact of chance preservation. In any case, Wikipedia can’t make up its mind whether the European parrots are true psittacids or only psittaciforms.

  62. John Cowan, David, SFReader, Hat, and whoever else might be interested:

    1-Considering the extreme internal diversity of Corsican and how unlike the rest of Romance Sardinian is, a linguist knowing nothing of the history of the Roman Empire but familiar with basic principles of linguistic geography might well correctly deduce that the Urheimat was in the middle of the Italian peninsula, not least since among Continental Romance varieties (=Romance minus Sardinian and Corsican) it is those of the Italian peninsula which exhibit the most diversity.

    2-Sardinian and Basque do indeed share a number of words, but it most certainly does not follow therefrom that the pre-Romance language of Sardinia (=Palaeo-Sardinian, or Proto-Sardinian, as it is sometimes called) was Basque-like: to repeat a point first made by Luis Michelena (or Mitxelena, to use the Basque orthography), Basque pre-Romance words shared with Romance varieties needn’t be ancient, indigenous Basque words: they could have entered Basque via Latin.

    3-David, Hat: you’re both quite welcome.

  63. And then again your linguist might decide that Proto-Romance originated on the islands and then spread to the peninsula and half Europe.

  64. David Marjanović says:

    Which of the following is the self-name of the Slovak language?

    Either a) or b). And the other is the name of the Slovene language for itself.

    …I’m pretty sure a) is Slovak.

    In any case, Wikipedia can’t make up its mind whether the European parrots are true psittacids or only psittaciforms.

    What it actually says is the point: stem-group psittaciforms are known only from the northern continents, so that’s where the origin of the crown-group must lie. The crown-group, by definition, consists of the last common ancestor of all extant psittaciforms plus all descendants of that ancestor; the extant psittaciforms are all parrots proper, cockatoos or lories. The stem-group is all the rest. The Wikipedia article calls the crown-group “modern parrots” and explains that its oldest known fossils are from Europe as well, though fossils from New Zealand are only a little younger.

    I was wrong about the restriction of stem-group parrots to Europe, though; I had forgotten about the one from the Green River Formation (a bit older than Messel or the London Clay, IIRC) and didn’t know about the one from India. “Fundamentally Laurasian” it is, then… or maybe “pantropical”.

  65. -…I’m pretty sure a) is Slovak.

    Correct!

    I did not include choices for slovenský and slovenski, first of which are another self-designations of Slovak and Slovene.

    All of the remaining choices were taken from words for Slovak, Slovene and Slavonic in every Latin-script Slavic language I could find on Wikipedia.

  66. Il vergognoso says:

    The internal diversity of Corsican is remarkable, has only been partly lost through the influence of Pisan Tuscan over the past thousand years or so, and has only been documented recently. It is quite ancient, and indeed in some ways Corsica is more diverse, linguistically, than the rest of Romance-speaking Europe combined. Really.

    While Étienne is here, I would much like hear more about this one. Itched over this for all the four months between, really.

  67. Seconded. Tell us, Étienne!

  68. Okay, I haven’t the relevant monograph on Corsican dialectology at hand, so details will have to wait, assuming anyone is interested. Here’s the gist of it: Romance-speaking Europe can be split into three “macro-zones” on the basis of the fate of Latin stressed vowels. Leaving aside diphthongs, Latin had ten vowel phonemes: long and short /a/ /e/ /o/ /i/ /u/. The three macro-zones treat these vowels thus:

    1-In Sardinian and Southern Corsican, plus a few isolated areas of Southern Italy, Latin length distinctions were lost, with the quantities remaining intact: thus, Latin /i/ and /i:/ merge as /i/, /u/ and /u:/ as /u/, and so on.

    2-In most of the Italian pensinsula and everywhere (minus Sardina and parts of Corsica, see below) further West (France, Spain, Portugal) /i/ and /e:/, on the one hand, and /u/ and /o:/, on the other, merge and (typically) become mid-high vowels, distinct from the reflexes of high long vowels (which lose their length but not their quality: /i:/ becomes /i/, /u:/ /u/) on the one hand hand and the mid-low reflexes of Latin short /e/ and /o/ on the other. Latin /a/ and /a:/ merge as /a/.

    Naturally subsequent changes have affected different languages: Spanish, for example, turned its mid-low vowel phonemes into diphthongs (/je/ and /we/), and thus ended up with five stressed vowel phonemes only.

    3-In Romanian (defined broadly, i.e. including Aromanian, Megleno-Romanian and Istro-Romanian) and a few isolated parts of Southern Italy we find a sort of “compromise” system, where the back vowels have changed in the same way as in the first macro-zone (/u/ and /u:/ merge as /u/, /o:/ and /o/ as /o/) and the front vowels in the same way as in the second (i.e. short /i/ and /e:/ are merged as a phoneme separate from the reflexes of /i:/ and short /e/). Here too Latin /a/ and /a:/ merge as /a/.

    So: outside of Corsica, the above three schemata account for ALL Romance varieties.

    It turns out that much of Northern Corsican is aligned with the second macro-zone, and Southern Corsican with the first. BUT there exist at least two other groups of Corsican dialects, EACH OF WHICH exhibits a pattern of merger of Latin stressed vowel phonemes that is utterly alien to any Romance variety outside Corsica.

    And it is in this sense that I wrote that Corsican shows more internal diversity than the rest of Romance combined.

  69. Fascinating. Thanks, Étienne!

  70. Il vergognoso says:

    Wonderful!

  71. David Marjanović says:

    I’m having what Kids Today call a nerdgasm.

  72. So what are the vowel mergers in the remaining two Corsican dialect groups, or where is a description to be found?

  73. David Marjanović says:

    *jumps up and down in anticipation*

  74. Well, since you asked…for what follows I am indebted to Marie-José Dalbera-Stefanaggi (2002), LA LANGUE CORSE. Paris, Presses Universitaires de France.

    You might want to take notes.

    Along with the two types of vowel merger I had sketched above (the Sardinian-like type in Southernmost Corsica, and the Italo-western-like one in most of northern and central Corsica), there exists, sandwiched between the two, what is called (after the major river of the area) the Taravo-type vowel system, which is unique within Romance. Classical Latin long /i/ and long /u/, and long and short /a/, all remain unchanged in terms of vowel quality, as is the case in the other two macro-zones. However, Classical short /i/ and short /u/ both fail to merge with any other vowel, and have as their present day reflexes a mid-low front and back vowel (respectively). Classical long and short /e/ have merged as mid-high /e/, and Classical long and short /o/ have merged as mid-high /o/. You thus have seven stressed vowel phonemes, as in the Italo-western system, but with different historical origins. Crucially, it is impossible to derive the Taravo system from either the Sardinian or the Italo-western system. All three must go back to the Classical Latin system.

    The fourth type is found in the northernmost tip of Corsica (known as Cap Corse), but unlike the Taravo-type system, this one COULD be derived from the Italo-western type.

    This system is very similar to what is known as the Sicilian vowel system. In both, Classical long and short /a/ merge as /a/, long /i/ remains /i/, and long /u/ remains /u/. All the remaining back vowels (Classical long and short /o/ and short /u/) merge as /o/.

    In Sicilian you find a parallel treatment of front vowels, where short /i/, long and short /e/, all merge together as /e/. Sicilian thus has a 5-vowel system. The Cap Corse system, on the other hand, merges the same three vowels as /e/, BUT, unlike Sicilian, in some positions it keeps short Classical /e/ as a separate phoneme (mid-low front vowel). Yielding a 6-vowel system (with, if you have been paying attention, the same vowel phonemes found in Proto-romanian…but with different historical origins).

    Notice, incidentally, that the Cap Corse system, if it does not go back directly to Classical Latin, can only go back to Italo-western, as neither the Sardinian nor the Taravo systems preserves a separate reflex of Classical short /e/. The Sicilian system, on the other hand, could be derived from either the Italo-western or the Taravo system.

    And finally, there is something else about Corsican vowels which is special and quite possibly unique within Romance. In the Southern area, with Sardinian-type vowel systems, you actually find a 6-vowel system in Corsica, whereas in Sardinia you find a 5-vowel system. Whence the difference, you ask? Well, it’s because of the fate of the Latin diphthong /au/. In Sardinia it merged with /a/, in most of Corsica and in Romance varieties where it became a monophthong it typically merged with mid-low or mid-high /o/, but in the Corsican South, with Sardinian-type vowel mergers, /au/ became mid-LOW /o/, phonologically distinct from the mid-high /o/ which goes back to Classical long and short /o/.

    Thus, synchronically, in terms of vowel phonemes, you can divide Corsica into three zones. The far South, with its Sardinia-like vowel merger and Classical /au/ becoming a separate monophthong, has a 6-vowel system, with more back than front vowel phonemes. Cap Corse, in the far north, has a 6-vowel system, but with more front than back vowel phonemes. This makes the two systems mirror images of one another…and in between you’ve most of Corsica, with symmetrical 7-vowel systems going back to either Italo-western- or Taravo-type diachrony.

    *Sigh* Apologies, I really should have kept this shorter…

  75. George Gibbard says:

    I would think you should have kept it longer with examples, but thanks.

  76. George Gibbard says:

    Etienne: some of us are willing to pay small amounts for such wisdom, but we very much like to read it free of charge, so thank you, it’s fascinating.

  77. Absolutely fascinating stuff. Going to read the Dalbera-Stefanaggi book.

  78. I agree with everyone else. Don’t apologize again or we shall have to chastise you severely.

  79. marie-lucie says:

    Je suis entièrement d’accord! Merci!

  80. David Marjanović says:

    What everyone is saying. 🙂

    Question: is there any Romance vowel system where the long and short /a/ of Classical Latin have not merged with each other?

  81. In answer to David’s question, no, there is no attested Romance variety which failed to merge Classical long and short /a/. This is unsurprising, really, since the pattern of mergers points to a general lowering of the quality of short vowels when compared to their long counterparts, and since /a/ is as low a vowel as a human mouth can produce, a general trend whereby short vowels became lower in quality would have left short /a/ untouched, so that once vowel quantity was lost phonemic merger between Classical short and long /a/ was all but inevitable.

    There is good evidence that long and short /a/ still existed as separate phonemes in the spoken Latin of Roman Britain. This is because Welsh/Breton/Cornish loanwords from Latin maintain separate reflexes of Latin stressed long and short /a/ (Things are murkier in unstressed position). Thus, Welsh has BARF from Latin BARBA (beard) and PECHOD from Latin PECCATUM (sin). Latin BARBA has a short /a/ in its stressed (initial) syllable, Latin PECCATUM has a long stressed /a/, and in Welsh /a/ is the reflex of Latin stressed short /a/, whereas /o/ or /au/ are the reflexes of Latin stressed long /a/.

  82. David Marjanović says:

    So, while Classical Latin had a ten-monophthong system which may have been quite similar to Standard German
    /a aː ɛ eː ɪ iː ɔ oː ʊ uː/,
    Proto-Romance appears to have had only nine monophthongs
    /a ɛ e ɪ i ɔ o ʊ u/,
    and Romance never reached Britain (before 1066), or if it did, Britain was already full of Christians by then.

  83. There are those who say romance never reached Britain at all…

  84. David Marjanović says:

    Ouch. There’s a nice one in one of the middle seasons of Blackadder, though, and then there’s always the Doctor. 🙂

  85. /a aː ɛ eː ɪ iː ɔ oː ʊ uː/ is how I was taught to pronounce Latin in 1971-75, though the first two often came out merged (and backed) anyway.

  86. David: or the Classical Latin ten-voyel system may have originally been a system where differences in length were unaccompanied by any difference in quality: (a a: e e: i i: o o: u u:).

  87. David Marjanović says:

    At some point (perhaps long before Latin, perhaps not), yes, except that [ɛ ɛː ɔ ɔː] are much likelier than [e eː o oː] from first principles as well as comparative IE evidence.

  88. David Marjanović says:

    Hm, actually, given that Celtic shifted the long ones all the way to [iː] and [uː], maybe the shift from [ɛː ɔː] to [eː oː] already happened on the way to Proto-Italo-Celtic. ^_^

  89. George Gibbard says:

    I don’t know about “first principles” — one can find many languages where long and short vowels are not said to differ in quality — but what JC was taught has abundant support from Romance languages.

    But I was recently thinking about this: what about a short vowel before a hiatus in Latin, should it be close or open? I read somewhere that it should be close, a wrinkle on JC’s system, but I can’t think where I read this. Now: there is actually reason to think that /ĕ/ should originally be open before hiatus, namely Latin Dĕum/Dĕō ‘God’ > French Dieu, so we have an instance of accented Latin /ĕ/ diphthongizing in an open syllable as if it had been *[ɛ], cf. Vulgar Latin *caelōs ‘heavens’ > *[kɛːloːs] > French cieux.

    On the other hand there is evidence to favor the idea that *ĕ was close before a hiatus, namely that French Dieu [djø], cieux [sjø] have unlike *bellōs > beaux [bo]. So does this mean that whoever I read was right about *[ɛ] > *[e] before hiatus, but thought this was an early rule whereas in fact it postdates *[ɛ(ː)] > *[iɛ]?

  90. George Gibbard says:

    Sorry, in haste I misunderstood the “first principles” comment, and on rereading I don’t understand it at all.

  91. George Gibbard says:

    angle-bracket problem: “French Dieu [djø], cieux [sjø] have unlike *bellōs > beaux [bo]”

  92. George Gibbard says:

    still an angle-bracket problem. I refer to the fact that the two cases are spelled eu and pronounced with [ø], while the other word has eau and is pronounced with [o].

  93. George Gibbard says:

    It occurs to me that French Dieu may be way of *Diós as in Castilian, but this will not work for Romanian Dumnezeu < *-dieu < Dɔmɪnɛ Dɛus. Does Portuguese Deus have /ɛ/ or /e/?

  94. George Gibbard says:

    That is DM, I quickly misread your comment on the assumption that you were defending JC’s Latin vowel system ‘on first assumptions’ — sorry for that. Then i saw you weren’t doing that, but making a different point, which I realized I don’t understand.

  95. He’s saying that five-vowel systems are worldwide more likely to be [a ɛ i ɔ u] like Polish than [a e i o u] like Spanish, vowel length or no vowel length.

  96. Romanian Dumnezeu and Polish “Pan Bóg” always struck me as an absurdly formal way to address deity.

    Mr. God?

  97. George Gibbard says:

    > He’s saying that five-vowel systems are worldwide more likely to be [a ɛ i ɔ u] like Polish than [a e i o u] like Spanish, vowel length or no vowel length.

    Aha. What’s the evidence?

    In Mexican Spanish I think the vowels in question are often closer to [ɛ ɔ] than [e o], but really in between. It doesn’t help that the IPA advises that, if the language doesn’t distinguish close-mid from open-mid vowels, one should say [e o], which makes you more likely to be right, but more unlikely that there is producible evidence.

  98. marie-lucie says:

    GG: dieu(x), cieux vs. beau(x)

    In at least one dialect I am aware of (part of Normandy), eau has turned into (still monosyllabic) iau [jo], so biau, viau, de l’iau, etc for beau, veau, de l’eau and similar words.

  99. David Marjanović says:

    He’s saying that five-vowel systems are worldwide more likely to be [a ɛ i ɔ u] like Polish than [a e i o u] like Spanish, vowel length or no vowel length.

    Yes; on top of that, while mid vowels (halfway between [ɛ ɔ] and [e o]) are apparently widespread in Spanish, I think actual [e o] are rare there, while actual [ɛ ɔ] are not.

    Mr. God?

    Lord God. Боже Господи.

    It doesn’t help that the IPA advises that, if the language doesn’t distinguish close-mid from open-mid vowels, one should say [e o], which makes you more likely to be right, but more unlikely that there is producible evidence.

    What’s really going on here is that 1) the earliest version of the IPA was developed for English and French; 2) the IPA is meant to look pretty when printed, so unmodified Latin letters are supposed to be used for as long as you can get away with them.

  100. Trond Engen says:

    Herregud!

  101. David: actually, Proto-Indo-European /o:/ only becomes Proto-Celtic /u:/ word-finally: in other positions it becomes /a:/. So I’m afraid the evidence doesn’t point unequivocally to pre-Proto-Celtic /e:/ having been higher than /e/, alas…

  102. David Marjanović says:

    Oh. Thanks.

    So I’m afraid the evidence doesn’t point unequivocally to pre-Proto-Celtic /e:/ having been higher than /e/, alas…

    It still does, just more weakly so, because the usual parallel for /o:/ is lacking.

    This invites comparison with Greek, where η has gone all the way to /i/, while ω is /o/ today, not /u/.

  103. David Marjanović says:

    This interesting paper on the origin of the Romance vowel systems mentions two more systems (“Marginal area”, “Outpost”) that are found in small parts of Italy; but one could be derived from the Italo-Western type and the other from the Eastern type by mergers. The Taravo system is not mentioned under any name. Corsica isn’t mentioned either, except the south with its Sardinian-type system.

  104. I fed Etienne’s five texts through GT, which identified (and garbled) them as Italian, Italian, Spanish (I thought it might pick Catalan), Italian, and Finnish (!) respectively. Note that Google Translate’s language identification algorithm is better (and more computationally expensive) than the Google language identification API, which looks (I think) only at letter frequencies and maybe a few very common words.

  105. John Cowan: the five Romance texts were shared with the hattery by Y, not by me.

  106. I look forward to seeing a more challenging set of such texts, without resorting to a bunch of closely related dialects.

Speak Your Mind

*