Arabic Harder to Read than Hebrew?

October 24, 2014 by languagehat 79 Comments

Or Kashti of Haaretz reports on a study that suggests that Hebrew speakers can read their native language more quickly than Arabic speakers can read theirs:

The study, conducted over the last three years, examined the speed and efficacy with which Hebrew and Arabic speakers read texts in their native languages. The texts were taken from two standardized tests, the psychometric exam and the international PISA exam.

Arabic, unlike Hebrew, is a diglossic language, meaning the oral language is different from the written (literary) one. The difference between spoken and written Arabic is so great, the researchers wrote, “that acquisition of the written language could be defined as acquiring a second language” – which in turn could influence “the development of linguistic mechanisms necessary for reading.”

Another difference is that Arabic orthography – meaning the shape of the letters and the use of diacritical marks – is more complex than that of Hebrew, making it harder to read. […]

This is one of the first studies to examine differences in reading ability among adults who have already mastered their mother tongue, as opposed to children.

The researchers found that, on average, Arabic speakers need seven seconds longer than Hebrew speakers to read 200 words aloud, while reading a 200-word text silently takes them about 16 seconds longer. And not only do Hebrew speakers read faster, but they also read more accurately, the study found.

These gaps cannot be explained by cognitive differences among the students or by other variables like parental education or socioeconomic status, the researchers said.

“The difference in reading efficiency stems from the differing speed of deciphering words in each language, something that’s apparently directly connected to the orthographic structure of the Arabic language and the fact that it’s a diglossic language,” Ibrahim said. “Reading in Arabic simply doesn’t reach the requisite level of automation, as it does for Hebrew or English readers.”

This raises all sorts of questions and requires various caveats (Prof. Rafiq Ibrahim says they should stop using texts translated from English or Hebrew on the Arabic exam), but it’s interesting enough I thought I’d pass it along and see what people have to say. (Thanks, Kobi!)

Comments

GeorgeW says

October 24, 2014 at 8:48 pm

“Another difference is that Arabic orthography – meaning the shape of the letters and the use of diacritical marks – is more complex than that of Hebrew”

I wonder if the test material was fully marked, which probably would slow most readers down. However, most Arabic texts, written for adults, use diacritical marks very sparingly. A typical newspaper will only use them to disambiguate words, and not even consistently then.

Even as a slow, second-language, reader, I prefer texts without diacritics. I find them distracting.
SFReader says

October 24, 2014 at 11:38 pm

—Arabic, unlike Hebrew, is a diglossic language

I would question this statement. Majority of Hebrew speakers are either first or second generation-speakers, so a different language is regularly spoken in their families.

So diglossia is pretty common for Hebrew speakers.
Vanya says

October 25, 2014 at 2:58 am

Blaming the orthography seems like common sense. Standard Arabic is very pretty and elegant but certainly more difficult to read than to other scripts. Farsi and Ottoman Turkish are also cumbersome in Arabic script. Arabic script is basically cursive, and even in English it is much easier to read Italic block print rather than cursive. I also find Cyrillic cursive fonts annoying to read because the “т”,”ш”,”н”,”м”,”и”, and “л” all look the same.
minus273 says

October 25, 2014 at 3:57 am

Could that also be because that Modern Hebrew is usually written with (quoth Wikipedia) matres lectionis even for historically short vowels?
Athel Cornish-Bowden says

October 25, 2014 at 4:22 am

It seems like common sense to me as well. I can’t read either the Hebrew or the Arabic scripts, but I’ve made efforts to learn both, and find Hebrew much easier. The first time I was in Israel (1963) I found I could quite quickly recognize place names and other words I wanted to recognize. By the time I went the second time (1996) I had forgotten most of that and made no further progress as I barely set foot outside the Weizmann Institute. The third time, last year, I also made little progress (in a few days), but felt that with more time and more necessity I could learn all of the letters. With Arabic, even quite strenuous efforts (but not backed up by any need) have left me able to recognize lam and sin (and maybe alif on a good day), but not much else.
D.O. says

October 25, 2014 at 4:49 am

I also find Cyrillic cursive fonts annoying to read because the “т”,”ш”,”н”,”м”,”и”, and “л” all look the same.

Underscore for ш and overbar for т go a long way to cure it.
Jonathan Wright says

October 25, 2014 at 7:24 am

I don’t know much about Hebrew but I know Arabic very well. I work in it day in, day out. I’m surprised to see this research uses ‘words’ as the basis for calculating reading speed. Generally speaking, a written Arabic text conveys the same information with about 80 percent of the number of words as the English equivalent text. This could account for all of the observed slowness in reading Arabic, and possibly more. But I don’t know how ‘wordy’ Hebrew is compared with Arabic.
GeorgeW says

October 25, 2014 at 7:59 am

Jonathan Wright: Good point, certainly compared to English. Example: ‘katabaha’ (1 word) = ‘He wrote it’ (3 words). So, if it took an Arabic reader 2 seconds to process the word and an English reader 1 second to process each word, the English reader would come out way ahead by their measure.
John Cowan says

October 25, 2014 at 1:57 pm

D.O.: How usual is it for native Russian speakers to actually use those bars in handwriting?

From The Name of the Rose:

“Adso, without those wondrous oculi ad legendum [reading glasses] I cannot figure out what is written on these books. Read me some titles.”

I picked out a book at random. “Master, it is not written!”

“What do you mean? I can see it is written. What do you read?”

“I am not reading. These are not letters of the alphabet, and it is not Greek. I would recognize it. They look like worms, snakes, fly dung, ….”

“Ah, it’s Arabic. Are there others like it?”

Without the derogatory connotations, that’s my experience of looking at Arabic script: it doesn’t resolve into anything, as even Devanagari does with a little practice.
David Marjanović says

October 25, 2014 at 3:00 pm

With Arabic, even quite strenuous efforts (but not backed up by any need) have left me able to recognize lam and sin (and maybe alif on a good day), but not much else.

It surprises me that you’re doing that badly. To me it seems no worse than the more extreme kinds of Latin-alphabet handwriting – even though I’ve never tried to learn the Arabic script or any language written in it, so there are letters I don’t recognize under any circumstances and I can’t even guess at the short vowels.

Example: ‘katabaha’ (1 word) = ‘He wrote it’ (3 words).

Doesn’t Hebrew do exactly the same thing?

How usual is it for native Russian speakers to actually use those bars in handwriting?

In my limited experience, they’re treated as obligatory.

In German, BTW, an overbar or rather breve on u was ubiquitous into the 1950s; every vaguely handwriting-like font used e.g. in advertising used it, and my grandma still does.
Stephen Bruce says

October 25, 2014 at 4:11 pm

In my experience, they’re optional. They’re especially useful in words like шишка, where the letters with humps are next to each other. Some people probably use them all the time, others not so much.

Many people also use the print т instead of the cursive m, even when writing cursive. And л and м have “hooks” in front, though sometimes these get smudged.

Still, it’s easier than the German Sütterlin, which often looks like to me a bunch of zigzags.
Jonathan Wright says

October 25, 2014 at 4:12 pm

David, I don’t know how wordy Hebrew turns out to be. That’s what I was asking. You’re probably right on the katabahu point but it goes way beyond just one instance. When I translate from Arabic into English it’s consistently 25 percent more words. Basically Arabic words are longer. They don’t look it because Arabic script is of course a form of shorthand, but if you count phonemes per word, Arabic comes out high. Again, can’t speak for Hebrew.
Jonathan Wright says

October 25, 2014 at 4:17 pm

I’m also surprised that for the Arabic text in the test they use a text translated from English or Hebrew. Given the quality of many translations, that might easily account for the whole of the difference. Bad translations are horrible to read
Y says

October 25, 2014 at 4:22 pm

SFReader:Majority of Hebrew speakers are either first or second generation-speakers, so a different language is regularly spoken in their families.

That was true once, but not anymore. Monolingual Hebrew households are the norm in Jewish households.

Matres lectionis / vowel marks: these are not used in general-interest materials in either Hebrew and Arabic. In Hebrew at least they are not used in school texts and exams beyond second grade. Much of it is because it’s hard work typing them in. In addition, so many of the phonetic distinctions marked by them are now utterly lost, that only a small minority of Hebrew speakers, even educated ones, can add them without making a lot of mistakes. In any case, the lack of them does not detract from reading speed. In fact, I would guess that I probably read slightly slower with them than without them, though it’s hard to test this objectively.

I haven’t seen the paper, but one point—the differing nature of the texts—seems sound. If you compared a native Spanish speaker reading a newspaper to a native English speaker reading the King James Bible, the latter would probably be slower too.

If you read Hebrew, there’s an opinion piece by Raphiq Ibrahim, one of the researchers. One of his claims is that “in one of the studies it has been shown that Hebrew and English scripts are efficiently processed by both cerebral hemispheres, while the Arabic script is processed efficiently only in the left hemisphere, due to the visual properties characteristic of the Arabic script.”

I still find it hard to believe that the script has anything to do with reading speed.
minus273 says

October 25, 2014 at 4:26 pm

Matres lectionis / vowel marks: these are not used in general-interest materials in either Hebrew and Arabic. In Hebrew at least they are not used in school texts and exams beyond second grade.

I didn’t mean niqquds, but writing yods and waws when there are vowels with a certain quality.
TR says

October 25, 2014 at 5:18 pm

Example: ‘katabaha’ (1 word) = ‘He wrote it’ (3 words).

Doesn’t Hebrew do exactly the same thing?

Biblical Hebrew yes, but Modern Hebrew is much more analytical. In fact it takes four words to say the above, counting the definite direct object marker et as its own word (it’s written as one).

And yes, matres lectionis are increasingly used where they historically wouldn’t be. It looks very ugly, but it does aid clarity.
Y says

October 25, 2014 at 5:49 pm

Sorry, I meant niqqudim, indeed. Matres lectionis in writing without niqqudim are a matter of balance between utility and esthetics. I personally use them more now than I used to. There are of course official standards for their use, but there are no standard standards…
D.O. says

October 25, 2014 at 6:08 pm

How usual is it for native Russian speakers to actually use those bars in handwriting?

I don’t know. Many do not connect letters at all even if keep the shape of cursive for individual letters or use some idiosyncratic mixture with parts of the words written in cursive, but with small interruptions between some letters. I guess, in more formal writing with fountain pen it was nearly universal. Mind, it would not help much with words like милиция
Paul Ogden says

October 25, 2014 at 7:29 pm

There are about a zillion caveats that should be examined before pronouncing on the validity of the study.

I’m going to weigh in a comment by Y, who from other comments at the Hattery I know has more than passing familiarity with Hebrew:

I still find it hard to believe that the script has anything to do with reading speed.

Typographers and graphic designers have long concerned themselves with the readability of printed texts. Even in the limited exposure we were given to these subjects when I studied journalism, I recall that the rule of thumb for optimal length of a typeset line (in English, and presumably for other Latin-alphabet-based languages) is 1.5 alphabets.

Any graphic designer will tall you that a serif typeface is more readable than a sans-serif typeface.

The Latin alphabet developed upper-and lower-case forms, which most alphabets, including Hebrew and Arabic, did not. When you have upper- and lower-case forms for letters, there is greater variety in the shape of letters, which enhances readability. Opportunities for ascenders (like the upper part of the left side of a lower-case h) and descenders (like the tail on a lower-case y), also enhance readability. Try this by placing a ruler on a text such that it completely covers a line of type. Move the ruler a millimeter or two down the page and try reading the line. At this point too little may be revealed to comprehend the text. Move the ruler down another millimeter or two and try again. At some point, with about the bottom half of the letters in the line still covered, the line will be readable. (Different typefaces will give different results, due to variations in the “x-height” among different typefaces.) Such other factors as space between lines (“leading,” pronounced “ledding”) also affect readability.

That test doesn’t work well in Hebrew; I’m not sure about Arabic, but I have my doubts. In my informal observation, Hebrew is naturally less readable than English. Its alphabet has few ascenders and descenders and several letters look nearly alike (e.g., ה , ח and ת ; plus ר ד . I suspect, without direct knowledge, that my observations about Hebrew apply to Arabic too.

Comparative length of translated materials: A Hebrew text of 100 words (as determined by Microsoft Word) will be about 135 words in English. Some of that is due to the ability of one Hebrew word to at times do the job of three or more in English (השמיע means ‘he caused to be heard’), while in other cases it’s because most prepositions take the form of a single letter prefixed to the following word, or the fact that the definite article is a single letter prefixed to the following noun. I presume some of these factors influence the length of Arabic texts in a similar way.
David Marjanović says

October 25, 2014 at 9:53 pm

Still, it’s easier than the German Sütterlin, which often looks like to me a bunch of zigzags.

It is a bunch of zigzags. For, presumably, some reason, e (the most common letter) looks exactly like n (the 2nd most common letter) except that it’s narrower, a rather hard distinction to make in handwriting.

Mind, it would not help much with words like милиция

I was taught to use onset hooks on л, м and я and then write фамилия ten times. 🙂
Y says

October 25, 2014 at 10:42 pm

Paul,
Not all legibility issues are the same. I agree with you fully about line length, but about the importance of ascenders and descenders, I think it’s what you grew up with that matters most. Yes, ב and כ are similar, but Israelis coming to American Jewish neighborhoods find the signs reading בּשר כּשר ‘kosher meat’ hilariously ambiguous, just because the minute serif-like projection of the ב is too minute. This is noticeable at a glance.

My grandmother grew up reading German printed in Fraktur, and later in Roman type. When I asked her if Fraktur wasn’t more difficult to read, she said that she can’t tell the difference between the two! And yet I’m sure that you can come up with studies that will clearly show that Roman type is easier to read.

So arguments based on subjective intuition that Arabic is harder to read—because its letters are connected, because ﺏ and ﻥ and ﺕ and ﺙ are too similar, I’m skeptical. When I am told that this part of the brain works lights up in the fMRI scans more for one script than another, it likewise falls short of a clear demonstration of burdensome illegibility.

Another example: I cannot understand how any human can read or write anything in Maya hieroglyphs. They are complex, and they all look the same. When carved in rough stone they don’t look like anything at all. And yet, they were written, and they were and are read (epigrapher Ignace Gelb, to his ignominy, argued for similar reasons that Maya hieroglyphs couldn’t be a script, which anyway wouldn’t be expected of primitives.)
Michael C. Dunn says

October 26, 2014 at 1:11 am

I agree with most of this; I don’t think the orthography impedes Arabic reader to any real degree, but diglossia does, since nobody learn the literary language t their mother’s knee. Translation from English or Hebrew could also lead to unfamiliar word order or sentence structure.

On @SFReader’s point, yes, most Israelis today learn Hebrew as their first language, but there are enough recent waves of immigration–ex-Soviets in the 90s, Ethiopians and Americans, that the proportions have shifted a bit. It would be useful to know how the speakers from birth versus the speakers who learned as adults compare.
Y says

October 26, 2014 at 1:40 am

From the Israeli bureau of statistics report for 2013: Of Jewish Israelis under 20 y.o., 4% are foreign born, and 20% have a foreign-born father. Couldn’t find statistics on household languages.
Vanya says

October 26, 2014 at 2:46 am

Underscore for ш and overbar for т go a long way to cure it.

Sure, but those aren’t used in italic type fonts in Cyrillic. Handwriting is a separate issue. Even in print, Cyrilic seems to me slightly less efficient than Latin, for the reasons Paul Ogen cites above – greater number of similar letters, fewer ascenders and descenders.
GeorgeW says

October 26, 2014 at 6:07 am

“because most prepositions take the form of a single letter prefixed to the following word, or the fact that the definite article is a single letter prefixed to the following noun. I presume some of these factors influence the length of Arabic texts in a similar way.”

Yes, in Arabic as well.
GeorgeW says

October 26, 2014 at 6:13 am

P.S.

filbait (fi+al+bait) = in the house.
David Marjanović says

October 26, 2014 at 9:23 am

Sure, but those aren’t used in italic type fonts in Cyrillic.

They are in Serbian – but that’s because handwritten as well as italic printed т is officially ш-shaped there. The same holds for п and и; additionally, г is pointed at the top and always gets an overbar. It’s all a nightmare for Unicode. 🙂

Lots of people don’t distinguish n from u in Latin handwriting; that’s what the Sütterlin ŭ was for.
David Marjanović says

October 26, 2014 at 9:24 am

Uh, п is the one that gets the overbar; and while ш gets an underscore, и does not.
languagehat says

October 26, 2014 at 9:44 am

I was taught to use onset hooks on л, м and я

Me too, and I’m shocked that people are too lazy to — and then complain about illegibility! What’s this younger generation coming to??
John Cowan says

October 26, 2014 at 11:57 am

For, presumably, some reason

Reason has little to do with it. There are two competing evolutionary pressures: ease of writing (which tends to make all letters scribbled and all alike) and ease of reading (which tends to make all letters different). Arabic moved very far toward the first objective, hence the invention of the now-mandatory dots and other consonant discriminators.
David Marjanović says

October 26, 2014 at 7:10 pm

A loop is easier to write than an n-shape. The latter must be an attempt to reproduce some pecularity of blackletter e…
Eli Nelson says

October 27, 2014 at 9:51 pm

The specific Sütterlin script was apparently designed by a particular person, Ludwig Sütterlin, so presumably some consideration of design went into its construction, rather than it resulting from a totally organic evolutionary process. However, the copious zigzag letters of Sütterlin do appear to be inherited from older German handwriting styles.
AnWulf says

November 13, 2014 at 7:22 am

“The difference between spoken and written Arabic is so great, the researchers wrote, “that acquisition of the written language could be defined as acquiring a second language” – which in turn could influence “the development of linguistic mechanisms necessary for reading.”

This is fast becoming the way for English (some [sum] 500 ways to write about 40 sounds). The -ough has at least eight ways to say it which is which I shun it and write thru and tho (both being written in formal writings in Am. English for over 100 years).

As for Russian cursiv, it can be tuff to read when handwritten but that goes for for Latin letters when handwritten too. I hav no problems with the Russian alphabet when I was at DLI (Defense Lang. Inst.), the alphabet was the first thing we learn’d and likely the easiest thing of the whole course.
GeorgeW says

November 13, 2014 at 10:30 am

“This is fast becoming the way for English (some [sum] 500 ways to write about 40 sounds). The -ough has at least eight ways to say it which is which I shun it and write thru and tho”

This is a serious problem for learning English reading, and particularly writing. We get words like eminent/imminent/immanent or right/rite/wright/write that sound the same but are written differently. Or letters that have no unique sound (like [c]) or single letters that represent multiple sounds (like [x]) or multiple letters that represent one sound (like [th]).

This is quite different from the Arabic problem of diglossia. Arabic orthography is “shallow” where there is a close correspondence between the sounds and the written form. The problem with Arabic is the widely different forms of the spoken and written language.

What they do share, as you say, are problems with the written language.
languagehat says

November 13, 2014 at 10:59 am

In my dialect, imminent and immanent sound the same but eminent is different (starting with /e/ vs. /i/).
John Cowan says

November 13, 2014 at 11:01 am

I don’t consider wurds that sound the same but ar written differently to be a really serious problem in Inglish orthography. What makes Inglish damnably hard to learn to read (for nativ speakers, who are for the moste part lerning the written form ov words they already knoe) is the 15% or so ov wurds that ar irregularly spelled (as opposed to spelled regularly with complex rules). Most texts are “write wunce, read menny times”, and it is much better for the writer to hav to remember whether the standard spelling is “imminent” or “immanent” or “immanunt”, than for each reader to hav to figure out whether “lead” is ment to be past tense or present tense. (If a few function words like of, are, have are exempted from reform, it woodnt hurt much eether — or yther, if you prefer that.)

(By the way, most anglophones don’t fuse eminent with the other two in speech, only those with the pin-pen merger.)
GeorgeW says

November 13, 2014 at 11:24 am

I didn’t mean to suggest that every single English speaker pronounces eminent/imminent/immanent exactly the same. But, some do.
Piotr Gąsiorowski says

November 13, 2014 at 4:56 pm

Re: The pin/pen merger.

I noe at leest wun immanent fonetician (frum Oaklahoma) hoo cawls it the “pin/pin merger”.
GeorgeW says

November 13, 2014 at 5:32 pm

“hoo cawls it the “pin/pin merger”.

I like that since there is no difference in pronunciation 😉
SFReader says

August 23, 2016 at 10:50 am

On hardness of Hebrew, a quote from excellent English translation of Jesuit Ratio Studiorum of 1599 (set of rules governing education in Jesuit colleges):

“Finally, he [Professor of Hebrew] should so plan his teaching techniques as to reduce and relieve by his efforts that outlandish harshness which in the minds of some bedevils the study of this language”
Lazar says

August 23, 2016 at 11:41 am

The main thing that flummoxes me about Arabic writing is the homophony: for example, unvoweled رجل can be rajila, rajil, rajl, rujila, rajjala, rajul or rijl – and that’s without case endings! I know people can generally get by from context, but it does add one more intimidating hurdle from a learner’s perspective.

Modern Hebrew, of course, mitigates this problem with ktiv male. Arabic writing does have one advantage over Hebrew, though, which is that you can generally know how a word is written from hearing it. In MH this isn’t possible owing to all the merged consonants.
Lazar says

August 23, 2016 at 2:13 pm

homophony

Homography, or more precisely heteronymy. Silly me.
Elessorn says

August 23, 2016 at 2:59 pm

Still, I bet that native educated readers of Arabic can read it just as fast as I read English. The difference between native and even very advanced non-native control of a script is in my experience a fathomless, gaping chasm. I can read Japanese very fluently, but when it comes to, say, scanning a block of text for a specific word, I could easily lose out to a sharp high-school kid. By contrast, I find I can *scan* Greek or Cyrillic almost as well as hiragana, though I would read either very clumsily. My assumption is that my Latin-script fluency is somehow carrying over.

At least I think there’s something to be said for a concept like “script fluency.” At this point I’m almost prepared to believe people could speed-read subtitles in Mayan hieroglyphics, given the proper environment to develop the skill natively as children.
languagehat says

August 23, 2016 at 3:30 pm

The difference between native and even very advanced non-native control of a script is in my experience a fathomless, gaping chasm.

In mine too, and I haven’t seen much discussion of this interesting fact.
Elessorn says

August 24, 2016 at 4:10 am

It certainly should be studied more, at the very least because it doesn’t work in intuitive ways. I find that, for example, though kana are of course easier to read than characters, the gap between non-native and native processing speeds is definitely larger for the phonetic script.

For the same reason I often wonder if we overestimate the difficulty of Ottoman script practices back in the day. I guess a fluent Uyghur reader could inform us, but given, well, Persian, I doubt there really would have been a problem in making it work with mass literacy. From my experience teaching English to Japanese speakers, at least, true reading fluency seems to come when learners *stop* reading words phonetically and start just recognizing them. Which makes me wonder whether even scripts that are phonetically readable in theory really are read phonetically by natives in fact.
Lars says

August 24, 2016 at 4:29 am

@Elessorn, I tried to introspect a little reading your post, and I find myself skipping back and forth over sentences in a way that would be impossible if I had to construct a coherent surface spoken form before parsing. I believe there are old eye tracking experiments showing the same thing. And English isn’t my native language, though it uses the same script and I’ve been doing most of my reading in it for 40+ years..

I also remember reading out loud for my daughter from Astrid Lindgren books (in Danish), and being very conscious of multitasking between reading ahead to figure out the construction of sentences and intent of speakers — and who the speaker was — and reading out the text with proper sentence intonation and character voice, sometimes a whole line or two delayed from the part I was trying to figure out. I don’t think that would be possible if I needed the ‘surface production’ part of my brain for reading ahead as well.
juha says

August 24, 2016 at 5:52 am

@Elessorn: Speaking of kanji, have you ever come across 鷸 with a kun-yomi はしばみ meaning はいたか?
Lazar says

August 24, 2016 at 9:53 am

I guess a fluent Uyghur reader could inform us,

The Uyghur Arabic alphabet is very different from the Ottoman one: it was constructed anew by Chinese officials after a failed cyrillization attempt, and it shows all vowel and consonant phonemes distinctly. I’m no expert on Turkish, but it does seem to me that Ottoman writing was poorly suited to its language compared with the Perso-Arabic alphabets still in use.
John Cowan says

August 24, 2016 at 11:14 am

Ottoman writing was poorly suited to its language

Indeed it was: three vowel letters for eight vowels (sixteen if you separate short and long) and no convention for digraphs, as English has, is pretty bad, although vowol harmono helps. But Ottoman was much less Turkic than contemporary Turkish is: in poetry you could go for many lines before encountering a native word.
Lazar says

August 24, 2016 at 9:09 pm

On the topic of Arabic, though, I’ve really taken an interest in it in the past few months for no practical reason. I didn’t feel drawn to any one dialect in particular, so my initial urge was to focus my attention on MSA – but after reading more about the sociolinguistic reality, I’ve decided that that’s maybe not such a good idea. The consensus seems to be that the use of unadulterated MSA in real conversation is socially untenable, placing it on a decidedly different footing from the standard varieties of most other big languages. In particular, the use of ʾiʿrab (case and mood endings) really seems to be a classicizing imposition that goes against the consensus of all dialects. I saw one study that found that even at the most elevated levels of conversation among educated speakers, they’re only used a minority of the time. In all the languages I learn, I aspire for a neutral variety that might lean a little toward the cultivated side – but not so much that it would isolate me from real speakers.

So I’ve become interested in a concept advanced by Georgetown’s Karin Ryding and others: so-called Formal Spoken Arabic. The basic idea is that it’s a colloquialized version of MSA, something akin to what educated speakers from different countries might use in conversation with each other. It’s admittedly somewhat… notional at this point, but at least right now I find it pretty appealing. As put forth, it would mostly keep MSA phonology and lexis, while dropping ʾiʿrab (except for a few fixed expressions) and dual conjugations and incorporating some of the most popular features from the dialects, especially Levantine – such as fī in place of hunāk for “there is”, raḥ in place of sa for the future, and metathesized object pronouns like -ak and -ik in place of -uka and -uki.

There’s really no authority for this, though, and there are some messy aspects involved in the move away from prescriptive MSA. For example, I’ve been comparing conjugations and pronouns from MSA, Levantine and Egyptian trying to arrive at a decent set of compromises. As some examples, for anyone interested, I’ve tentatively decided on the pronouns ʾanā, ʾinta/ʾintī, huwa/hiya, ʾiḥnā, ʾintū, hum; for the verb كتب, the past forms katabt, katabt/katabtī, katab/katabat, katabnā, katabtū, katabū and the non-past forms ʾaktub, taktub/taktubī, yaktub/taktub, naktub, taktubū, yaktubū; and for the doubled verb رد, the past forms raddayt, raddayt/radaytī, radd/raddat, raddaynā, raddaytū, raddū and the non-past forms ʾarudd, tarudd/taruddī, yarudd/tarudd, narudd, taruddū, yaruddū. I’m still not sure to what extent I’m really going to pursue this, but even a rough familiarity with practical Arabic is certainly a useful thing.

I’d also note that I’ve found the phonetic description of Arabic by our favorite renegade Italian linguist, Canepari, to be invaluable in approximating a nativelike pronunciation: I haven’t been able to find a comparably detailed account anywhere else.
languagehat says

August 25, 2016 at 9:05 am

I’d also note that I’ve found the phonetic description of Arabic by our favorite renegade Italian linguist, Canepari, to be invaluable in approximating a nativelike pronunciation:

Where would one find this description?
David Marjanović says

September 7, 2016 at 7:09 pm

The Uyghur Arabic alphabet is very different from the Ottoman one: it was constructed anew by Chinese officials after a failed cyrillization attempt

After a Latinization attempt that was suddenly declared failed for, presumably, some reason. Uyghur was of course Cyrillicized in the Soviet Union, but not in China.

Where would one find this description?

Intrigued minds want to know!
Lazar says

April 13, 2017 at 9:27 am

So I’ve been making more progress lately with Arabic, and with the FSA concept. I find that no one source really suffices to get a full handle on the differences between this inchoate variety and normative MSA, but based on Ryding’s Formal Spoken Arabic: Basic Course, John Mace’s Arabic Today (which is mostly searchable on Google Books), and many academic articles and other sources that I’ve scrounged up, I think I’m starting to arrive at a good workable model. I’ve generally figured out how to change conjugated verbs into a (supra-)dialectal form within an MSA phonology, and I’m working on a list of relevant lexical items and grammatical points. For example, I’m using šū for direct questions but keeping mā for indirect ones (an insight from Mace); I’m using miš for nominal negation and mā for verbal negation (though without the post-verbal -š suffix that sometimes accompanies it in dialect); and I’m adopting the present prefix bi- (which Ryding omits, but which seems to be pervasive even in semi-formal speech in the western Mashriq) and the future prefix ḥa-. And I find that the process of hammering out all these nitty-gritty details is improving my understanding of both MSA and the Levantine and Egyptian dialects, which I know is crucial if I aim to do much with the language.

Canepari’s description is here, by the way. (Sorry, I missed those comments.) I don’t want to sound like a booster (confession: I kind of am one, though), but like several of his others I’ve found that it just trounces anything else that I can find. I’ve come across many other accounts of Arabic phonology, and so far none of them have mentioned the laxing of short i and u in checked syllables – and in fact most of them don’t mention any high vowel allophony at all. The other most useful thing that I’ve found is listening to recordings by native speakers on Forvo.
Lazar says

April 13, 2017 at 9:47 am

[Comment awaiting moderation above.] That confirms a hypothesis of mine: one link followed by one edit will put a comment in the spam filter. There was a broken tag, though, and I just couldn’t resist.
languagehat says

April 13, 2017 at 9:50 am

Thanks for the update, and the link!
languagehat says

April 13, 2017 at 9:50 am

Sorry about the moderation filter — it has a mind of its own.
David Marjanović says

April 13, 2017 at 6:19 pm

Thanks for the link!
minus273 says

April 14, 2017 at 6:22 am

ḥ and ʕayn draws a towards [æ], while emphatic sounds and r draws a towards [ɑ]. I don’t think Canepàri has grokked the correct difference.
Lazar says

April 14, 2017 at 9:46 am

Really? Most of the recordings that I hear on Forvo of, say, “حالك”, “محمد” or “عاش”
seem to have a central [ä], intermediate between the frontish unmarked [ɛ̈] and the emphatic [ɑ̈]. Wikipedia, for their part, equate the pharyngeal approximant [ʕ̞] with [ɑ̯], which would pull in a back direction.
David Marjanović says

April 14, 2017 at 10:05 am

Canepari uses confusing terminology. He calls [ʡ] “a pharyngealized laryngeal stop”, which is almost the Moscow School term. He calls both ḥ and ʕayn “pharyngeal”, but uses a pharyngeal symbol only for ḥ, [ħ], while using the epiglottal symbol [ʢ] for ʕayn.

Pharyngeal(ized) consonants pull vowels toward [ɑ], epiglottal ones toward [æ]. The obligatory link to soundfiles of both in the same language is here.

In the Maghreb, both ḥ and ʕayn are epiglottal [ʜ ʢ̞]. I’m pretty sure I’ve heard a pharyngeal [ħ] in some kind of Arabic; most likely, pharyngeal vs. epiglottal is a geographic difference.

Wikipedia has abandoned the distinction of pharyngeal vs. epiglottal places of articulation, or had last time I checked. The argument for this, however, seems to boil down to the fact that the epiglottis is in the pharynx. The fact remains that the epiglottis or something close to it is involved in some “pharyngeal” consonants but not in others, and that the effects on vowels are very different.

Most or all kinds of Arabic except the standard distinguish /r/ from /rˤ/, the latter corresponding to /r-ʔ/ sequences in the standard (with or without a vowel in between): standard /raʔs/, “Syrian” /rˤas/ “head”. Link to soundfile in the next comment.
David Marjanović says

April 14, 2017 at 10:08 am

All the emphatics including /rˤ/ and /rˤː/.
Lazar says

April 14, 2017 at 10:13 am

He calls both ḥ and ʕayn “pharyngeal”, but uses a pharyngeal symbol only for ḥ, [ħ], while using the epiglottal symbol [ʢ] for ʕayn.

Well, he uses the symbol [ʢ] for a pharyngeal (page 168 here, halfway down).
minus273 says

April 14, 2017 at 10:24 am

Lazar, David: Thanks! Seems that the question is still more complex than what I have thought. I have listened a bit more on Forvo, and it comfirms an intuition that I had for a moment: Initial ʕayn is often very different from medial ʕayn. The medial one is almost always [æ]-ish (“epiglottal”?) and the initial one is often an ɑ-colored glottal stop. There’s one paper of a certain age that described uvularization ≠ pharyngealization in a rural Palestinian dialect, but this is not always supported by the remaining literature.

I have consulted about the correct names for ɑ-color and æ-color with a Caucasologist (Gilles Authier) once. He told me that the pharyngeal-epiglottal distinction doesn’t exist, and æ-color should be called pharyngealization tout court. Indeed, at least for the Agul data, I don’t think that the a is colored differently in muʕar and jaʡar; calling the ʕ a ʕ and the ʡ a pharyngealized/epiglottalized glottal stop ʔˤ is for me at least as elegant as the treatment of postulating a pharyngeal-epiglottal distinction.
David Marjanović says

April 14, 2017 at 10:43 am

Aghul is an East Caucasian language and as such has a large vowel system. It can’t afford to color its vowels much. What happens is, AFAIK, the opposite: [ħ] and [ʜ] are allophones that depend on the surrounding vowels.

In Lakhota, /ʁi/ comes out as [ʀi].

[ʔˤ] would sound like any other “emphatic” in coloring vowels toward [ɑ] rather than [æ].
minus273 says

April 14, 2017 at 10:55 am

Some forvo’ing from the same speaker (UAE fusḥa):
rašīd, rakabihā
ṭarīd, ṭarīdan
ʕaṭasat, ʕabīdatu
ḥasanan, ḥadað
minus273 says

April 14, 2017 at 11:08 am

[Some forvo’ing pending moderation, showing that at least in some pronunciations, r sides with emphatics not ʕ]

David: I tried to pronounce an ɑ-coloring ʕayn and an æ-coloring ʕayn. Both are eminently pronounceable. If ʕaynishness is just pharyngeal = æ-coloring, it’s hard to see how it’s possible to have an ɑ-coloring ʕayn. So now you made me tilt a bit towards pharyngeal-epiglottal splitterism.
Lazar says

April 14, 2017 at 11:25 am

@minus273: What I’ve noticed in religious, tajweed-oriented videos is that the emphatics and rāʾ, xāʾ and ġayn seem to cluster together with a very retracted, sometimes even rounded vowel, and everything else with the unmarked frontish vowel. (They’re also very insistent about using [dʒ] for jīm, whereas I lean toward the Levantine [ʒ].)
David Marjanović says

April 14, 2017 at 12:47 pm

Well, he uses the symbol [ʢ] for a pharyngeal (page 168 here, halfway down).

Ah yeah, and [ʕ] for a “prepharyngeal” approximant… and [ʜ] for a “prepharyngeal” fricative…

Thanks for the Forvo links, I’ll listen at the first opportunity, which could be tomorrow or several days away.
John Cowan says

April 14, 2017 at 1:42 pm

As the hero of An Elephant for Aristotle says in his Thessalian Scots: “Forbye, where Greek gets along with four guttural sounds, the gamma, kappa, chi, and rough breathing, Syrian has twice as many: a battery of gasping, coughing, retching, and gargling noises.”
ə de vivre says

April 14, 2017 at 3:18 pm

For the same reason I often wonder if we overestimate the difficulty of Ottoman script practices back in the day.

I’m late to the party on this one, but I think cultural and ideological factors are largely responsible for elevating the common perception (insofar as anyone perceives it in the first place) of Ottoman writing to almost paradigmatically bad. In a way, Turkish’s vowel harmony gives it an advantage with respect to an Arabic-based script. In most words, as long as you know the last vowel in the root, anything to the right is predictable.

For one thing the Kemalists’ ideology of modernity had an interest in portraying it as an archaic jumble of irrational tradition (homologous to the administration of the Ottoman Empire) in order to promote the rationality of the Latin-based alphabet. Sure literacy took off once they implemented the Latin alphabet, but that was accompanied by a massive reorganization and expansion of the education system. And this overhaul of the education system make the Ottoman script less legible in other ways too. The elite of the Turkish Republic weren’t being educated essentially tri-lingually in Turkish, Arabic, and Persian anymore, which made the reference-heavy style of older genres of Ottoman writing harder to read simply because the readers didn’t know the language encoded by the script. And as literacy spread in the late Ottoman period, the polyglot language games tended to go out of style as well.

Part of the difficulty with the Ottoman script is the lack of standardization. There were many ways to compensate for the lack of consistent vowel marking (especially in the first two syllables of a word), but since none of them (to my knowledge at least. I am, alas, not an expert in Ottoman orthography) ever became standard, from the reader’s point of view having multiple possible conventions was just barely more helpful than having no conventions at all. They did however come up with a near-universal convention of re-using superfluous Arabic consonants to indicate the frontness/backness of the following vowel. So while ت was used for [te, ti, tö, tü], ط was used for [ta, tı, to, tu]. Unfortunately, however, they didn’t have doubles for all the letters, and it leaves plenty of room for ambiguity elsewhere. They were also really inconsistent about distinguishing voicing in their stops, but there may be historical phonetic reasons for that.

I’m not saying the Ottoman Arabic script was a great system, but it’s not the Escher-esque irresolvable illusion it’s sometimes made out to be.
languagehat says

April 14, 2017 at 3:51 pm

ə: I restored your comment from the e-mail the software automatically sent to me, but I don’t know where you might have wanted blockquotes; let me know and I’ll add ’em.
ə de vivre says

April 14, 2017 at 4:37 pm

Just that first line, “For the same reason I often wonder…” Çok teşekkür ederim sayın şapka efendi!

Also, I realize now that there are also a bunch of typos in that post. You can correct them if they bother you, otherwise I will bear them as a mark of shame.
languagehat says

April 14, 2017 at 4:53 pm

I corrected the typos that manifested themselves by red underwiggles, and put the quoted bit in itals rather than blockquotes — my hattic powers, my habits.
David Marjanović says

April 14, 2017 at 5:37 pm

a battery of […] retching, and gargling noises

Canepari seems to concur when he calls the “velar fricatives” of Arabic uvular fricative-trill intermediates.
Rodger C says

April 15, 2017 at 12:19 pm

Stridentia anhelantiaque verba.
Lazar says

April 15, 2017 at 1:40 pm

On the topic of back consonants, one question that I’ve wrestled with is what to do with hamzas (i.e. glottal stops). I’m with Ryding on the idea of maintaining a basically MSA phonology (especially since a dialectal pronunciation would require an intuitive grasp of acrolectal vs. basilectal status – to know where to use a sound like [q], and where not to – which I definitely lack), but at the same time, a preservation of classical norms in words like raʾs or biʾr seems a little affected. And from what I’ve read, it’s likely that hamzas were already being dropped in the Koranic-era Hijazi varieties from which koineized Arabic derives, so it’s hard to object to it 1400 years later. But on the other hand, a totally basilectal treatment of hamzas would probably represent too great a divergence from MSA: words like masāʾ (> masā) would undergo stress shifts, and words like saʾalt (> sālt) would require new phonotactic rules.

But what I noticed in Egyptian sources is that their national standard includes rās alongside bidʾ. So I’ve tentatively decided on this idea: hamzas will be dropped where they do not or cannot start a syllable. This allows us to keep the hamzas in saʾalt, or in badʾ (=bidʾ), or masāʾ (if those last two are followed by al, then the hamza will move to the following syllable), but lets us dispense with it in rās and bīr. (Mace’s treatment of weak verbs also removes hamzas in final untressed position – e.g. qaraʾ < qarā – which is analogous to the dropping of unstressed final h in words with tāʾ marbūṭah.) By the same token, though, I think it makes sense to strengthen hamzat al-waṣl in front of a stressed vowel – so, for example, šū ʾismak instead of the pseudo-classical šu smak (which I’ve never heard anyone say).

(This is a little rambly, but I like to get my thoughts out there.)
John Cowan says

April 15, 2017 at 2:44 pm

Alas for de Camp, Thessalian was apsilotic from the earliest records, so Leon’s own Greek would have only three gutturals. Still, one of Alexander’s officers, an educated man, would certainly recognize every kind of Greek. Indeed, his narrative voice is not nearly as Scots as the dialogue lines of his men, and when speaking he code-switches between gude braid Thessalian and an entirely unmarked variety according to his company.
Lazar says

May 7, 2017 at 5:50 pm

Learning (middle) Arabic is really proving to be a unique experience for me. When I started I was naively aligned with the “MSA without case and mood endings” idea; later, as I saw that that approach yields a somewhat unnatural predominance of a, I adopted the common vowel harmonizations in verbs and personal pronouns described by Mace (e.g. yaktub → yuktub, yaqrā → yiqrā, ʾanta → ʾinta), while following him in leaving nouns and adjectives basically unchanged. But since then I’ve become increasingly tempted by some shifts in non-verbs that seem very common among dialects, a handful of which – like miṣr → maṣr and ḥimmiṣ → ḥummuṣ – are even given as alternate forms by Wehr in his classic dictionary. (Some other very common ones: xinzīr → xanzīr, ḥiṣān → ḥuṣān, ḥulw → ḥilw, qiṭṭ → quṭṭ, kabīr → kibīr.) So I’ve adopted a strange method of using Wehr, Wiktionary, the Egyptian site Lisaan Masry (which gives both Egyptian forms and Egyptian-accented MSA forms), the Levantine site Living Arabic, and an Iraqi dictionary that I’ve found to suss out a) which of competing MSA (and occasionally purely dialectal) words to prefer, b) which of competing broken plural forms to prefer, and c) where and where not to shift short vowels in a dialectal direction. I’m working through words by broad semantic category, and have about 500 done so far. Once my list is a little more polished, I could share it with anyone who’s interested.

I’ve also adopted a version of the vowel deletion tendency common to many dialects, which I find makes my speech seem a lot less stilted (and also, long story short, fixes a problem that I was facing of having to use different phonotactic rules within and between words): unstressed short i and u are deleted under the condition VCVCV, yielding (e.g.) mā biyuktub → mā byuktub, biyirudd → biyrudd (bīrudd), ḥayirudd → ḥayrudd, sayyāritī → sayyārtī, ražulēn → ražlēn, tūnisī → tūnsī. And for conversational usage, I’m being helped a lot by an FSI booklet that I’ve found that details the main differences between Egyptian and Levantine. (For coherence sake, I’ve shifted from a pick-and-choose approach toward more consistently favoring the latter.)

Ultimately, I think my target variety is seeming more and more like Levantine with a few classical influences on word choice and phonology – which I’m pretty okay with. (On the latter point, I’ve read that even a full preservation of qāf can be found, for example, in Druze communities and parts of rural Lebanon.) I’ve never linked my language learning too strongly with practical concerns, but there is a pretty big Lebanese community in my area, and it would be nice for my speech not to be too far removed from theirs.