We’ve discussed the notorious Pahlavi script before (e.g., last December), but I’m not sure people realized quite how bad a script it is. Now Ben Joeng has a Twitter thread/rant explaining why:
I often describe Pahlavi as the worst writing system ever invented.
Let’s take a perfectly-serviceable writing system for a completely different language (Aramaic) and adapt it for our language (Middle Persian). We’ll call this new script “Pahlavi”.
Aramaic doesn’t really write all the vowels, only consonants, but that’s OK for Aramaic, because through a quirk of Semitic grammar, consonants carry most of the semantics. But our language, Middle Persian, is Indo-European and *does* carry a lot of semantic weight in vowels.
This is why when Greek (IE) borrowed writing from Phoenician (Sem.), it repurposed a bunch of the consonants that Greek didn’t have to use as vowels.
Pahlavi didn’t do that, though, it just carried on not writing vowels.
But what about consonants that Middle Persian has that Aramaic doesn’t? For Greek they made up some fancy new letters like ‹Φ› and ‹Χ›. Or for Dutch the used digraphs like ‹ch›. Pahlavi didn’t do that, no. Aramaic has no /f/, so Pahlavi writes /f/ identically to /p/.
There are a lot of Aramaic consonants that aren’t present in Middle Persian. This means, yes, about half of the 22 Aramaic letters are unused by Pahlavi (sort of, more on that later), but the letters that *are* used are *still* ambiguous.
But let’s make things worse. Let’s start writing some of the letters so similarly that you can’t tell the difference between them. In Book Pahlavi (the most common form), they shaved Aramaic’s 22 consonants down to 13. This means, e.g., /g/, /d/, and /y/ are all written with the same letter, despite them being different in Aramaic.
But let’s make things worse. Book Pahlavi is a cursive script. When two letters come together, they can look identical to a completely different letter. Think about how cursive “iu” might, without the dot, look like “ui”. Book Pahlavi has no such dots. Good luck. All this means that the Middle Persian word for God, “Ohrmazd”, could be just as easily read (and occasionally mispronounced as!) “Anhoma”.
But let’s make things worse. Like all languages, the sounds of Middle Persian changed over time. But Pahlavi didn’t change, which means you write words like “šab” (“night”) as “špa”, because it *used* to be pronounced with a /p/.
But let’s make things *far* worse. So I lied up there. You don’t write “šab” as “špa”. No, you write it “LYLYA”. What. Well, that’s because “lēləyā” is the Aramaic word for “night”, and much like how Japanese borrows Chinese characters to write Japanese words, Pahlavi borrows whole Aramaic *words* to write Middle Persian words. How do you know to pronounce it “šab” instead of “līlīa” (or even “rīlīa”, or “ragulda” or… ugh) or something? You don’t, you just have to know that.
How do we know this isn’t just a borrowing from Aramaic? Because we have Middle Persian dictionaries that say “remember that when you see ‘lylya’, you have to read it as ‘šab’.” Pahlavi is *full* of these “aramaeograms”, even the word for “az” (“from”) is written “MN”.
But the real stinker is, after the Persian literati realized this mess would cause people to horribly mispronounce the sacred texts, they invented a new script based on Pahlavi we call “Avestan” that wrote *all* of the sounds, consonants & vowels, to extreme phonetic detail.
Seriously, it goes so far that we don’t really know why certain letters are distinguished, but they must have been pronounced subtly differently in the past. Why wouldn’t they use this clearly superior system to write Middle Persian instead of friggin’ Pahlavi?
Avestan was invented primarily to write their sacred language (also called Avestan), not the everyday language, so maybe you’re thinking it’s too sacred for plebian Middle Persian. Nope! There’s a tradition of writing Middle Persian using Avestan called “Pazend”, but it was primarily only used to write commentaries on the sacred texts, and… remember how I said there were dictionaries that told you how to pronounce the aramaeograms? The pronunciations? They were written in Pazend!
WHY DIDN’T THEY JUST SWITCH TO USING PAZEND ALL THE TIME‽‽
In the past, I’d thought that the introduction of the Arabic script to write Persian was an imposition that occurred due to the introduction of Islam, but it wasn’t just that, it was, despite still not writing those short vowels, an actual step up!
Pahlavi does have two things going for it though.
One, it is really pretty.
Two, whenever you write “Ahriman” (the Zoroastrian personification of evil), you always write it upside-down
That’s pretty awesome.
let’s just say there’s a reason why despite Book Pahlavi being an incredibly important historical script, it *still* isn’t encoded in Unicode. because we’re still not totally sure how to do it! despite many of our experts being Persians!
Thanks, Y!
Why Pahlavi is so awful
Cui bono?
We’ve discussed the notorious [..X..] before (e.g., last December), but I’m not sure people realized quite how bad a [..Y..] it is. Now [….] has a Twitter thread/rant explaining why:
I often describe [..X..] as the worst [..Y..] ever [..Z..].
A nice beginning. You can never remind how bad something is too often.
(but with this specific value of X it is of course more like a confession of love. I don’t know if it also can work with people (spouces, children…))
So true!
I deny the adequacy of abjads to write Semitic languages absent supplementation with vowel pointing or other workarounds … Maybe it’s less bad than using the same inadequate abjad to write an IE language, I suppose. But that doesn’t mean it’s good. Of course, down at the Ge’ez end of the Semitic world, they figured out how to turn an abjad into an abugida. Problem solved!
But does not inclination of native speakers to write them so indicate that it is conveniet?
And worse, Arabic rasm without dots: here even vowel patterns don’t explain why people wrote n, t, th, b, i the same way. But they DID.
I mean, you could say they did because they didn’t like disconnected writing, but why did they make letters identical in the first place?
Interesting story with similar beginning but different ending? “There were numerous problems in writing Malayalam using the Syriac alphabet, which was designed for a Semitic language. Only 22 letters were available from East Syriac orthography to render over 53 phonemes of Malayalam. Both the languages are not related to one another in any way except for religious causes. These problems were overcome by creating additional letters.” https://en.wikipedia.org/wiki/Suriyani_Malayalam
I deny the adequacy of abjads to write Semitic languages absent supplementation with vowel pointing or other workarounds
No writing system fully represents all the contrasts present in the spoken language. It’s all just a question of degree. They’re all of them exercises in just being good enough.
Speaking of INadequacies, I often read that learning to read in their native langauge is so difficult for speakers of languages like Baloch, even when they are literate in Urdu, because books are in Arabic script and the language is IE and vowels work differently.
No explanation of how they manage to learn to read in Urdu:)
I suppose, it must be “it is a different language, it is easier!”
Sure, re “good enough.” But the historical development of add-on/supplemental features to the Semitic abjads to indicate the vowels that the “pure” abjad doesn’t reflects a view by users of the language(s) that the “pure” abjad is not good enough, at least for some sorts of texts in some sorts of contexts.
Well, they use them here and they don’t use them there.
Now I think we are supposed to suppose that both time it is “convenent” and then examine the situation closer:
– if it is convenient, what factors could contibute in that?
– was there masochism among those factors? Possible, but not a hypothesis one would want to start with.
– did that accidentally arrived to a system which is difficult to change even though it is less convenient than some other system (in other worse, transition requires crossing a barrier)?
Syriac, by and large not an awful script, does have its messy corner (here, pp. 6–8). A certain dot, or several dots, somewhere above or somewhere below the word, don’t have any particular phonetic value, but mean, “not that pronunciation—the other one.”
Actually, one reason why English orthography looks less messy than it is is that you either know how words are pronounced, or can look up them in a disctionary with IPA.
And of course that we use it, not just talk about it.
Doing to English what we usually do to ancient languages would be a lot harder.
Or maybe you know, they would have adopted Classical English pronunciation. “As we know from contemporary texts, by 20th century you couldn’t find two English speakers that pronounce vowels the same way”
Or maybe you know, they would have adopted Classical English pronunciation.
I suspect that they would have probably adopted a pronunciation closer to Middle English than to any modern form. Depends on how many sources on phonetics they could find, I guess…
(Either that or they’d have insisted on using contemporary-to-them Late English pronunciation, with correspondingly hilarious results. Gaglia è ogne divisa in parte tre, quaro una incolono Belge, aglia Achitani…)
A certain dot, or several dots, somewhere above or somewhere below the word, don’t have any particular phonetic value, but mean, “not that pronunciation—the other one.”
Modern Hebrew sometimes doubles yod for a similar purpose, e.g. יצא yatsa, ייצא yetse.
“WHY PAHLAVI IS SO AWFUL.”
Also the English name is a hybrid of пахлава “baklava” and пехлеви!
a system which is difficult to change even though it is less convenient than some other system (in other worse, transition requires crossing a barrier)?
the evolution of yiddish spelling could be looked at as a slow march along this path, but in both directions.
i don’t have the expertise to describe the early sequence well, but characters for vowels appear pretty early on within the basic aramaic abjad, and coexist with diacritic vowels til the 20th century (my favorite versions are the abugida-ish ones that use both, because they give more sonic information). but, simultaneously, most words with hebrew & aramaic etymologies become more consistently unpointed over time, making part of the writing system fairly impenetrable to anyone without substantial literacy in the liturgical languages*, active component-consciousness, and a solid sense of how the etymons are transformed within yiddish – all in all, not so different from aspects of pahlavi.
the one serious** attempt to make the system functional, by spelling hebrew/aramaic-derived words as they are pronounced, very quickly*** increased literacy and fed the proliferation of yiddish writing of all kinds in the early u.s.s.r. but, of course, it was rejected and denigrated on ideological grounds that had little to do with language (allegedly “historical continuity”, but primarily anti-communism and anti-secularism). it’s not entirely clear to me how much of that hostility was itself about the rapid expansion of yiddish literacy to an increasingly politicized jewish working class (see notes below), but it’s hard for me to imagine that wasn’t a significant element.
which makes me wonder how much the uselessness of pahlavi was an actively preferred thing for the literate classes.
. (mild rant:)
* there’s a persistent myth that this level of loshn-koydesh literacy was historically common in the yiddish-speaking communities of eastern europe. this is sheer bunk, as the slightest effort to take seriously what is actually said about literacy and knowledge of loshn koydesh in contemporary texts makes clear. to give one example: people who oughta know better really love to pretend that sholem aleykhem’s tevye not only existed, but was typical. but the entire point of the character is that balegoles [teamsters] were notoriously the furthest people from the well-off class strata that received meaningful religious education, as well as being notoriously/stereotypically profane and crude in language (when they didn’t just speak their trade’s cant), and aggressively un- or anti-religious****. without that, the characterization of tevye as subtly thoughtful about interpersonal relationships, thoroughly (if often inaccurately) familiar with the rabbinic literature, and deeply invested in a personal relationship with the divine, loses both its comedic punch and its political import.
** which is to say, not including the various romanization schemes, which (unlike the romanization of ladino following the romanization of instanbul-turkish under atatürk) never had substantial encouragement from either a state or major cultural institutions.
*** especially among women: only 18% of yiddish-speaking jewish women in ukraine aged 50-59 and 45% of those aged 20-29 were literate in 1897; by 1926 (after less than 5 years of peace and public education), those numbers were 25% and 49%.
**** this is all bound up, of course, with the bloodline (“yikhes”) caste system of the shtetl. not only would no respectable person agree to a wedding match with a balegole’s child, but they considered them even more not-really-jewish than most poor or working-class jews (non-jewishness is one of the less-advertised connotations of “amorets”, which contrasts with various terms for ‘a genuine/proper jew’). the quasi-racialized aspect of this is made clear in the various places were balegoles were refered to as “jewish gypsies”.
(/rant)
which makes me wonder how much the uselessness of pahlavi was an actively preferred thing for the literate classes
AFAIK, there were professional scribes in Persia during that period, for whom a difficult to read or write script would have been an important trade secret, keeping away competition.
This thread does lead me to wonder (and to ask assorted fellow hatters here the question): Why is there such a blatant contrast between Pahlavi and Brahmi?
Both are adaptations of Aramaic script used to write an Indo-Iranian language, both arose at roughly the same time (second-third century BC), and yet unlike Pahlavi, Brahmi makes no use whatsoever of Aramaeograms, has a complete set of QUITE distinctive symbols that represent ALL of the segmental consonant phonemes (including ones which corresponded to nothing in Aramaic, i.e. aspirates and retroflexes) and most of the vocalic phonemes of early Prakrit, and as a special bonus it innovated by creating a special set of purely numerical symbols (something no Phoenician-derived script at the time had created).
In short, Brahmi may be said to be a near-perfect mirror image of Pahlavi in terms of its simplicity/efficiency as a script.
I personally find it difficult not to think that the contrast must be due in part to the indigenous Indian linguistic tradition: Panini’s grammar (whose composition took place at about the same time, nota bene!) is obviously the product of a very sophisticated tradition of linguistic analysis, including phonological analysis, and whoever adapted the Aramaic script to Early Indo-Aryan must have been (a) first-rate phonologist(s).
The lack of prestige of Aramaic in India (as opposed to its high prestige in Persia) may also have played a role too (such a lack of prestige would have made Indians far more willing to tinker/experiment with the script and modify it than Persians would have been).
P.S. “Ben Joeng” is Ben Yang’s Twitter handle.
TR: I never noticed that! But then, my tendencies are toward more defective spelling, and some newfangled uses of double yods are distasteful to me.
I’ve often wondered whether the weird “inherent vowel” thing with Brahmi (and its less successful contemporary Kharosthi), where the plain symbol for the consonant C actually represented Ca and plain C required an extra diacritic, was inspired by Persian cuneiform, which did a similar thing but less consistently.
(TIL, by wiki-walk from the linked Twitter thread, that Old Persian cuneiform was still used at least into the reign of Artaxerxes III, i.e. nearly to the end of the Achaemenid dynasty. I thought it was a brief innovation of the Darius/Xerxes era that was then entirely replaced by Aramaic. Most of the known inscriptions are monumental, though.)
…though, admittedly, most of them are a, like in Old Persian and basically all of Indic.
But yeah. India had phonology and mathematics before it had writing, and it shows.
The monopoly of Classical Chinese as the language of serious literacy persisted in Korea for over four hundred years after the invention of Hangul, the Korean alphabet, allowing an entrenched class of scholar-officials to keep hold of their power over the state. Preventing widespread access to literacy was a feature, not a bug.
only 18% of yiddish-speaking jewish women in ukraine aged 50-59 and 45% of those aged 20-29 were literate in 1897; by 1926 (after less than 5 years of peace and public education), those numbers were 25% and 49%.
Bbbut surely women aged 20-29 in 1897 were bound to become women aged 50-59 in 1926 (give or take one year) even in Ukraine? And yet their literacy rate fell from 45% to 25%? Vas iz gevarn mit di idn?
The literate women emigrated?
@D.O. @Vanya
yes, emigration is the best explanation i’ve been able to come up with. and it seems like a pretty good one, since those most able to leave – especially during the 1914-22 war years – would’ve been the better-off families, whose daughters would be the most likely to be literate. over the same period, the literacy rate for men aged 50-59 actually fell from 66% to 60%, which seems like a reflection of the higher emigration rate for men (again skewed towards the better-off, who’d be more likely to be literate), especially with conscription as a push factor. war and pogrom deaths don’t seem likely as central explanations, since on the one hand the over-50-year-olds of 1926 wouldn’t’ve been very likely to be on the front at 40+, and on the other, non-battlefield deaths would be skewed towards the poorer and less literate.
According to the instructions for the 1897 census (as reproduced by an image in Wikipedia), for the literacy question, you answered:
Yes-you are literate in Russian
Yes, in X-you are literate in X, not Russian
No-you are illiterate
I consider this as likely to lead to errors (better to ask two questions) and to misquoting of the results (the Yes numbers).
The question in the 1926 census is not single; you fill in one blank if you are illiterate (although the way it is stated you could answer with all the languages in which you are illiterate!) and two other blanks, viz., what languages you can read and write in, and what languages you can only read in, if you are literate. This grouping is still likely to lead to misquoting of the “literate” numbers.
The source I found for the 1926 form is
http://www.demoscope.ru/weekly/2006/0267/arxiv04.php
I’d be careful with connecting literacy and writing systems. The Chinese write.
And see my complaint above about Baloch – people keep insisting that Baloch speakers can’t read Baloch in Arabic script “because it is IE” even when they are literate in Urdu.
fill in one blank if you are illiterate
My brother claims to have seen a sign by the doorbell of a local branch of the British Deaf Association: “Please ring and wait.”
I don’t know anything about the history of Russian-then-Soviet census methodology in that era, but I would not necessarily assume that the forms were typically filled out by the actual person (or head of that person’s household) being enumerated as opposed to by the low-level government-employed enumerator who knocked on the door, asked the questions, and wrote down the answers received.
Please ring and wait.
Quoth my daughter: “Maybe it lights up inside somewhere.”
I remember a Sesame Street bit, where Big Bird got really interested in how Linda’s doorbell actually controlled lights.
I agree that pahlavi is quite bad as a writing system.
Other terrible ones would be
– Hittite: all the multiple readings of original cuneiform signs + akkadograms + summerograms + no word division + lack of standard spelling and systemic sound correspondence as was usual before widespread public education
– old Roman cursive: even the Romans themselves complained about the illegibility of contemporary handwriting. Many letters that were visually similar eg. E could be written as ll, M could be written as llll, leading to sequences such as llllll. Some letter looked very similar eg. A & R….
– Japanese
…and still is as part of ä ö ü.
The Hardest Writing System! – an animated rant about learning Japanese
Kanji Story – How Japan Overloaded Chinese Characters
Hittite is known to have done most of these things, too – and quite a bit worse in some cases.
…and still is as part of ä ö ü.
I’m still reeling from this revelation. Those little everyday dot workers have a heritage going back to Cicero, yet I’ve been treating them as if they just got off the bus from Albuquerque !
That’s why the Kurrent e is identical to n except narrower. It’s a whole separate tradition…
@PP @JWB
i’m quite confident those instructions are not for the people whose answers are being recorded, that not being how most censuses [latinists, please correct me!] are done. and yes, there’s certainly a lot of ambiguity in the data – including the issue of whether the 1926 soviet census questions are at all aligned to the 1897 tsarist ones. but they’re the best material we’ve got, so it’s on us to try to pull what we can out of them.
llllll
and thus the battle between YIVO’s melupm-vov וּ and everyone else’s shtumer alef א, as in צוװוקס!
, but now there is a cat
i had a little more
Latin plural cēnsūs (4th declension, singular cēnsus). I don’t know of any language that has managed to borrow declensional length distinctions from Latin (or maintain inherited ones), but I’m pretty sure English didn’t even try.
(Danish has folketælling and before that mandtal–even in Luke 2:1–so no help).
Censussesses, Precious.
(Like syllabussesses and sinussesses.)
Platypuzzesses, says my most reliable Australian source.
I would tend to reword to avoid censuses, but not syllabuses, sinuses, platypuses, presumably because the former has /s-s-s/. The U.S. Census Bureau certainly says censuses, though; one can get used to anything, even hanging. In any case, English tends to tangle its speakers in excessive (front) fricatives. “That that is, is; that that is not, is not. Is not that it? It is.”
It would never even occur to me to use anything but censuses.
Syllabi is common and doesn’t seem objectionable. Sure, the word arose from a typo in the first place, but wasn’t it second declension once formed?
I’ve seen archeology books that use “cursūs” as the plural of “cursus” (a circular track that looks like a racetrack but surely wasn’t). How it’s supposed to be pronounced, I don’t know. “Cursooz”?
Surely in that case it’s used as a Latin word and expected to be pronounced accordingly (“cursoos”).
t would never even occur to me to use anything but censuses.
Nor I, but I would rather write “the 1930 census and the 1940 census” (perhaps omitting one instance of census or the other), despite the verbosity, than “the 1930 and 1940 censuses”.
“Censuses” doesn’t have /s/-/s/-/s/, it has /s/-/s/-/z/. Sounding awkward could be an issue, but I hardly think the orthography makes it look awkward.
Tell that to Kaa.
Shelley preferred the “cenci” plural in his tragedy about head-counting.
And Japanese prefers “sensei.”
Mehraban Book Pahlavi was announced just a few days ago, claiming to be the first typeface designed for Book Pahlavi:
So apparently Book Pahlavi is finally about to be added to Unicode?
Wow, that must have been a lot of work.
Nah. You only need five new squiggles to cover the whole alphabet …
The upside-down ‘Ahriman’ seems to be available as a separate glyph, though the most recent Unicode proposal document doesn’t seem to assign it its own code point. Presumably, the font layout table is engineered so that the right sequence of letters gets converted automatically to the upside-down glyph, or if your software doesn’t support this functionality you can at least input it manually.
I doubt Unicode will allow the ‘Ahriman’ as its own character given their aversion to precomposed characters except in legacy cases. There is a single Unicode Character ‘﷽’ (U+FDFD) for the Bismillah, but I think it was only permitted for legacy reasons (something about Pakistan requiring the Bismillah on all official documents if I remember correctly).
There’s a tweet missing above — after “Anhoma” comes:
And one last tweet after “awesome”:
Thanks, I’ve added them in.
The font looks nice, in individual words (I’m less sure about text).
“Persians”
I won’t try to guess what set of people is referred here:)
(thought… perhaps just the immigrant community with the highest percentage of PhD holders…)
censuses [latinists, please correct me!]
I was tempted to say “Why would you need to consult Latinists? There is such a thing as an English dictionary.” But the difficulty is that some English dictionaries don’t give a plural for census, and users may not realize that that’s not because there isn’t one, it’s because print dictionaries didn’t list regular plurals. They didn’t have enough space. You were just supposed to know that if they didn’t give a plural, then it was formed with ‑s or ‑es. Or else there was no plural because it wasn’t a count noun; you were just supposed to know that, too. (Maybe dictionaries for children or foreign students gave more explanation?)
Some print dictionaries did recognize that readers might want reassurance about the plural of census, and gave censuses explicitly: Funk & Wagnalls, Random House, Collins, Encarta, New Oxford Dictionary. Good for them. The online versions of MW, AHD, and Macmillan still don’t give an explicit plural.
Wiktionary never had the space constraints of paper, and it gives the plural for every noun, regular or irregular (at least, generally it does; I haven’t checked exhaustively). The Oxford Languages definitions provided by Google (if you use the keywords “define” or “meaning”) also give plurals for all nouns, if you click through to “see more” or “more definitions”.