Arabic Typography: An Interactive History.

Vita Nouva has a remarkable interactive introduction to the terrific experience of rendering Arabic typography and its technical debt:

Once upon a time, a frontend ticket landed on my queue which was not properly mine, but the only other Arabic reader on the team was on leave. It went roughly as follows; a block of mixed-content Arabic prose on the customer-facing dashboard was rendering with a ragged left edge (the rag falls on the left in Arabic, since the lines set out from the right margin; the ticket said “ragged right”) when the design team had explicitly specified justified text. Attached were three screenshots from three browsers and a polite note from the product manager observing that the Latin-script version of the same block looked, I quote, “fine.”

The same six months I had closed three other tickets against the same product, each of which had presented to its filer as the only bug. A customer’s name had appeared with its letters unjoined on a printed agreement, the way a sign-painter would have laid them out in 1962, because the PDF library on the receipt server pre-dated the existence of a shaping engine in its language runtime. A search index had been returning empty for accounts the customer service team could see in the database because a 2017 import had encoded twelve thousand names using fossil Unicode codepoints from 1991 instead of regular ones from 1995, and the index, very reasonably, treated the two encodings as different strings, So, that ragged-left ticket was the smallest of the four, HOWEVER, it sat on top of the same iceberg and pointed at the same thing. […]

It did look fine. I spent about half an hour with it, I walked the rendered DOM, I set text-align: justify in so many different combinations of font-family and direction declarations, and at the end of the exercise I wrote a reply explaining, more or less honestly, that the problem was not a bug in our stylesheet but the state of Arabic typography on the web.

The reply took and the closure of the ticket took half an hour or so. The reasons behind it took five hundred years to pile up, and they involve a twice-mutilated vizier, a Qurʾān that vanished for four centuries, a Beirut newspaperman with a deadline, and an Egyptian physician who taught himself font engineering for fun (or that what I imagine about him). Walking through these, ended up to be the most enjoyable couple of weeks in that job, and I want to go through it here too.

Trust me, the resultant story is worth your while, especially if you know or care anything about Arabic and/or coding. I got it via Lycaste’s MeFi post, where there are some good comments:

This was really interesting! It makes Chinese typography look dead simple.
posted by zompist at 6:27 PM on June 11

The bit about the “Arabic numerals” confusion reminded me of when I worked on the World Digital Library project, which was localized into Arabic and had member libraries in all three of the regions mentioned. This was back before things like the JavaScript Intl API existed so we had custom server and client-side formatting functions which used the Arabic-Indic digits, and considered using the extended form for the handful of countries where the extended variants were common (this is even harder than mentioned if you want to support the various Indian subcontinent variations which fortunately we did not).

Because this was back in the dark ages, this was a surprising time-sink because you had to do support old browsers and operating systems with terrible text rendering support as well as navigate the minefield of fonts which shipped with limited language ranges to save space, causing fallback to other fonts for those specific characters in a way which broke diacritics and ligatures. If you had a global audience and targeted libraries or schools, you could reliably count on someone having a hard time reading text in a non-Western European language because some part of their stack was both old and hard to fix. I opened more bug reports with Microsoft, Apple, Mozilla, and Google about text rendering than is healthy because in some cases this completely changed the meaning or made languages effectively unusable (e.g. Javanese is read by ~100M people but, similar to Arabic, if it’s not rendered with the connecting ligatures it’s incoherent).

About a decade into the project, there was a partners meeting in Egypt and this came up in the context of a site redesign. The Arabic speaking partners apparently decided relatively quickly to use the Western 0-9 numerals pervasively because while scholars didn’t consider it proper, those were the only set of numerals which were universally familiar to Arabic computer users because so many things weren’t fully localized even if they otherwise had full Arabic UI text. This made life easier on the redesign but I remember it feeling somewhat sad as I removed that code because we’d actually tried and it had ended up being less useful than doing nothing because so many other programmers never tried.
posted by adamsc at 9:16 PM on June 11

It’s a literal endnote to this fascinating post, but I was happy to read at the end that Brill spent $750,000 to make fonts for Semitic philology and then gave them away for free. My first association with Brill is “extractively expensive for-profit academic publisher” and this raises them in my estimation.
posted by sy at 9:21 PM on June 11

Now I know why, to my knowledge, no one has yet tried rendering any of the maghribi (Moroccan / North African) Quranic lettering styles online. At least not as far as I know.
posted by rabia.elizabeth at 4:44 AM on June 12

> My first association with Brill is “extractively expensive for-profit academic publisher” and this raises them in my estimation.

If you think of it as a one-time investment towards perpetual capture of a publication market, it’s not as nice even if it provides tangential benefits.
posted by at by at 9:49 AM on June 12

On the “sultan bans printing” thing, see Did the Ottomans Ban Print?

Comments

  1. David Eddyshaw says

    Very interesting. I’ve often wondered about the cursor thing.

    Also struck by

    Behdad Esfahbod, who wrote much of HarfBuzz before Hosny, is Iranian-Canadian. In 2017 he was detained for ten hours at the US border on suspicion of being Iranian, which he was. He was working at Google at the time. The shaping engine running in your browser at this moment, which paints every Arabic letter you see correctly, was for years carried by an engineer the US government considered a security risk.

    מַה־שֶּֽׁהָיָה֙ ה֣וּא שֶׁיִּהְיֶ֔ה וּמַה־שֶּׁנַּֽעֲשָׂ֔ה ה֖וּא שֶׁיֵּעָשֶׂ֑ה וְאֵ֥ין כָּל־חָדָ֖שׁ תַּ֥חַת הַשָּֽׁמֶשׁ׃ (what’s the cursor doing here?)

Speak Your Mind

*