Fwent: A Mystery.

March 3, 2023 by languagehat 37 Comments

John Cowan wrote me as follows:

I found the following sentence in the Kindle edition of a story by Josephine Tey: “To my unbounded relief, however, Lizbeth lapsed suddenly from the borders of hysteria to her normal fwent calm.”

“Fwent”?

Googling the first part of the sentence shows simply “her normal calm” in other editions, so “fwent” is probably not a typo for some other word (and if it were, what word could it be?) Googling for “fwent calm” shows no other instances. Bizarre.

Any guesses what went wrong? My only guess is that it is some kind of markup (not HTML) that infiltrated the text, like the 1805 KJV Bible edition where “to remain” (apparently being used in place of “stet”) wound up being printed in Gal 4:29, making it read “But as then he that was born after the flesh persecuted him that was born after the Spirit to remain, even so it is now.” It *almost* makes sense.

I for one am baffled, and I’m curious what the assembled Hatters make of it.

Comments

prase says

March 3, 2023 at 2:12 pm

It is surely a typo. F is next to G, and Gwent is a Welsh preserved county (presumably well known for the stoic calmness of its inhabitants).

One of my friends once used the word “parostroj” (Czech for steam engine) when writing his diploma thesis to mark sections he intended to revisit later. (The thesis was about astrophysics, not steam engines.) Quite naturally one of the steam engines evaded final editing and survived till the submitted version. The committee members thought it was there to test whether someone actually reads the thesis at all.
Stu Clayton says

March 3, 2023 at 2:40 pm

covfefe. Even a typo can get you into the headlines these days.
languagehat says

March 3, 2023 at 3:16 pm

But it’s clearly not a typo, since the intended sentence has “Lizbeth lapsed suddenly from the borders of hysteria to her normal calm,” with nothing that could be twisted into “fwent.” If someone were literally typing in the text for this edition, one might guess their fingers slipped and produced a little gibberish without their noticing, but surely that’s not how things happen these days.
Ryan says

March 3, 2023 at 3:16 pm

It’s lenition with epenthesis.

Okay, I’ve given you a hint. You guys should be able to take it from there…

More seriously, I wonder whether it’s the kind of typo that occurs when someone is typing elsewhere and then bumps the cursor, doesn’t recognize what happened and goes back to where they meant to be–“I fucking went… (whoops, what happened?) Anyway, I went to three stores and couldn’t find the right model.” A common laptop error for those without the good sense to disable touchpad tapping.
Jen in Edinburgh says

March 3, 2023 at 3:44 pm

It’s like a bit of a chapter heading got in when a page break in the original was removed, or the stray words they used to print at the bottom of the page to match them up.

I’ve done a bit of DP Canada proofreading, and the oddest things do get into the scans, although they try to make sure they all come out again.
mollymooly says

March 3, 2023 at 4:51 pm

Maybe the editor meant to hit CTRL+F followed by “went”, but missed CTRL?
languagehat says

March 3, 2023 at 5:22 pm

It’s like a bit of a chapter heading got in when a page break in the original was removed, or the stray words they used to print at the bottom of the page to match them up.

I’ve done a bit of DP Canada proofreading, and the oddest things do get into the scans, although they try to make sure they all come out again.

Thanks, I guess that’s the best we’re going to do unless someone involved with the production of this edition drops by and says “Shit, I can’t believe that stayed in there! Here’s how it happened…”
J.W. Brewer says

March 3, 2023 at 5:42 pm

Perhaps hat is a little old to put on his dancin’ shoes for this, but I present you with “La Fwent,” an actual musical composition from circa 2003. (This is the original mix; there are remixes out there.) https://www.youtube.com/watch?v=pL63HkDn9KU
languagehat says

March 3, 2023 at 6:15 pm

Good lord! That raises two questions: was Pedro Delgardo involved in the production of this book, and where did *he* get the name from? (And yes, I’m too old for that techno stuff.)
J.W. Brewer says

March 3, 2023 at 6:43 pm

FWIW one generally-quite-reliable music-reference website reports that “Pedro Delgardo” is a stage name for Pete Gawtry, who is said to be a “Techno/Electro/Electronica DJ & producer from Leeds, UK.” “Gawtry” isn’t quite an anagram for “Gwent” but it’s in the neighborhood?
languagehat says

March 3, 2023 at 6:53 pm

“Gawtry of Gwent” sounds like a medieval romance.
cuchuflete says

March 3, 2023 at 8:06 pm

One day I saw upon the stair
an errant fwent that wasn’t there.

It fwasn’t there there again today.
Oh how I wish it fwent away.

The ostrich and the merkin
soared up high in fwenglish phoome.

We fwent astray again, Deare Fwends,
Parr for the coarse?
John Cowan says

March 3, 2023 at 9:31 pm

Or a modern historical novel.
AntC says

March 3, 2023 at 9:33 pm

I present you with “La Fwent,” an actual musical composition …

Name clearly a tribute to Discos Fuentes has often been described as Colombia’s version of “Motown”, … — for which I’d happily put on my dancing shoes/Buona Vista Social Club vibe.

BTW how long am I supposed to put up with that “La Fwent”‘s intro before it actually starts?
david says

March 4, 2023 at 9:14 am

That long introduction is meant to bring you back from the borders of hysteria to a place where you can hear the subtler variations.
J.W. Brewer says

March 4, 2023 at 10:51 am

I think maybe the practical answer to AntC’s “how long” question is that with a sufficient dosage of C11H15NO2 [sorry for lack of subscript numbers] it just wouldn’t bother him?
John Cowan says

March 4, 2023 at 12:05 pm

Maybe the editor meant to hit CTRL+F followed by “went”, but missed CTRL?

This seems to me like the best suggestion yet. And it’s true that books are rarely typed in by hand (other than by the original author) any more, but the final stage of OCR correction still has to be done by hand.

an errant fwent that wasn’t there

It was perhaps chasing a rapidly vanishing fnord. In any case, a lovely verse. The original of Wendy from Peter Pan seems to have gotten her nickname from fwiendie < friend+ie.

C11H15NO2

C₁₁H₁₅NO₂, using Unicode subscript digits, typed as AltGr+q followed by the digit on the Moby Latin keyboard and its UK relatives. AltGr+q in general provides smart quotes, em and en dashes, and other such punctuational rarities; the superscript digits are just AltGr+digit.
Kenzo says

March 4, 2023 at 1:24 pm

My first thought is that it’s a glitch produced by OCR software. If the original is scanned at an angle, or has a crease in the page, or has columnated text or some other unusual formatting (this last one being unlikely in the case of a novel), then a word can easily jump to the wrong place in the OCR-produced text, especially using cheap software

The misplaced word in this case would also have been misread by the software. Could there be a nearby passage missing the word “fluent”, for example?
Brett says

March 4, 2023 at 2:17 pm

John Cowan: …the final stage of OCR correction still has to be done…

I’m going to stop you right there. Lots of commercially produced e-books have clearly never had a human read-over between the ORC and publication.* Works by major authors actually get the OCR mistakes corrected, but I would not put Josephine Tey in that category.

* I recently paid a couple of dollars for a digital copy of Kothar, Barbarian Swordsman by Gardner Fox—supposedly one of the best pastiches of Robert E. Howard’s Conan stories. (Fox makes no attempt to hide the fact that Kothar is a reskinned version of Conan. Plenty of the proper names are clearly chosen to sound similar to Howard’s. Moreover, decades later, when the Conan stories were adapted into a comic book series by Marvel, Fox—who was a famous comic writer himself, creator of The Flash, Hawkman, and Doctor Fate—allowed his Kothar stories to be adapted as Conan stories when Marvel needed material for additional issues.) There are glaring OCR errors on practically every page, and at one crucial point several whole lines are missing. No human ever checked the text before it was made available for sale.
Jen in Edinburgh says

March 4, 2023 at 4:25 pm

I suspect the various Gutenberg projects are much better checked than some commercial versions of out-of-copyright texts – it takes a lots of volunteer hours, and you’re not going to pay for the work if you’re going to end up selling it for 49p the lot. Although some of them just use the Gutenberg text, of course!
John Cowan says

March 4, 2023 at 6:35 pm

This text, though commercial, is fairly clean — I would notice.
Brett says

March 4, 2023 at 7:26 pm

@Jen in Edinburgh: Yes, thanks to Distributed Proofreaders (which I used to be pretty involved with too, specializing in technical works), the texts at the Project Gutenberg sites tend to be very clean. I don’t know why anyone releasing a (superfluous) e-book edition of an out-of-print work would use any other version, except perhaps due to a misunderstanding of copyright laws. It’s e-books for things that are still in copyright (like Kothar, Barbarian Swordsman, since Fox only died in 1986) but which are not expected to sell many copies that really tend to be the pits. The quality of the scan from which the text was extracted, and the specific software used, can make for big differences in the qualities of the e-book products.
Y says

March 4, 2023 at 8:12 pm

It’s a wonder to me how OCR got stuck where it has been for years now. An undistracted human reading printed text of reasonable quality will have a Zero Point Zero error rate, including identifying different scripts. The best commercial OCR programs will boast something like maybe one error per page. I imagine that the market for OCR these days is not scholars, but people digitizing office materials and legal evidence that nobody will read anyway.
rozele says

March 4, 2023 at 10:26 pm

[fnord]
John Cowan says

March 4, 2023 at 10:29 pm

The best commercial OCR (for legal contracts, e.g., which lawyers certainly do read) is improving steadily. An example is ABBYY. I haven’t tested any of the OCR readers that claim to be AI-based, but I expect that if they aren’t that great now, they certainly will be.
David Eddyshaw says

March 4, 2023 at 10:32 pm

@rozele:

There’s a typo in your comment.
Y says

March 4, 2023 at 10:32 pm

I have used ABBYY quite a lot. It’s the best, and it’s so-so. And it’s a pain to get it from so-so to a little better than so-so.
David Eddyshaw says

March 4, 2023 at 10:40 pm

I was just trying out the entirely non-commercial ocrmypdf on my non-searchable pdf of Lukas Neukom’s grammar of Nateni. It managed the French text pretty well – enough to make that part reliably searchable, anyhow – though (forgivably) it seems to have given up altogether on the diacritic-heavy Nateni.
John Cowan says

March 4, 2023 at 10:50 pm

Y: It’s certainly much better in some languages than others. What languages were you using? And had you paid the extra $$$$ to unlock the higher levels?

DE: OCRing depends entirely on having a predictive model of the language being OCRed (and the same is true for speech recognition). Without that, the results are not even so-so.
rozele says

March 5, 2023 at 10:18 pm

@DE: but you can see it!
Y says

March 6, 2023 at 4:32 pm

JC: I used ABBYY 14, I think the $$$ version (i.e. $130 ten years ago, not quite $$$$), which lets you train it if it has problems with a particular font. I have used it mostly for linguistic texts written in European languages about obscure languages, some using a variety of ad-hoc diacritics. For English/French/German it’s “acceptable”, meaning that if you search the text, you are likely to find what you are looking for. For anything else it’s hopeless, unless, for each book, you spend many hours fighting the adversarial training user interface; if you do, it gets closer to “acceptable”.

A human, for comparison, could transcribe any of these texts, diacritics and all, with 100.0% accuracy, without any language model.
January First-of-May says

March 6, 2023 at 6:12 pm

with 100.0% accuracy

I wouldn’t put 100.0% on anything human; typos exist. Though I guess 99.97% (corresponding to roughly one typo per 1-2 pages, depending on font size) is quite achievable with some care, and that would round to 100.0%.

A more important consideration is that in human-transcribed texts the typos are usually less frequent in the weird bits (e.g. foreign text with diacritics), because those are more carefully looked over. Old-style OCR would have a lot less of an idea of what to do with that sort of thing; modern “AI”-based OCR would probably be prone to straight-up inventing stuff that vaguely looks like it could be there.

which lets you train it if it has problems with a particular font

I think this concept goes back to at least the 1990s. (It wasn’t perfect back then either.)
John Cowan says

June 17, 2023 at 12:14 am

It occurs to me that the title of this post is (most likely accidentally) parallel to the full title of Bram Stoker’s novel, which is Dracula: A Mystery.
David Marjanović says

June 17, 2023 at 6:05 am

A more important consideration is that in human-transcribed texts the typos are usually less frequent in the weird bits (e.g. foreign text with diacritics), because those are more carefully looked over.

Quite the opposite. German and French references in scientific papers in English are almost invariably misspelled unless of course enough of the authors speak one of these natively.
Brett says

June 17, 2023 at 5:10 pm

I want to add a caveat to my comment about about how Project Gutenberg generally has good quality texts. This is not so much true for the books that were posted in the early days of the project, before a proofreading system was established. Unfortunately, this includes a lot of the most interesting works. I have recently been reading the Memoirs of General W. T. Sherman (document number 4361 on the site), which is riddled with errors. Most of the problems are obvious OCR mistakes, but there are formatting and other kinds of errors as well. Moreover, while most of the mistakes are easy to spot and mentally correct, not all of them are. I am genuinely unsure, from what I have been reading, whether there was a brigadier general named “Smart,” or whether that it just an occasional error for “Stuart.”
Y says

June 17, 2023 at 5:45 pm

It’s Stuart. Search in the archive.org version (or in GBooks, which had digitized it).
J Pystynen says

June 17, 2023 at 6:57 pm

BTW how long am I supposed to put up with that “La Fwent”‘s intro before it actually starts?

It’s a 12-inch single and as such gets in full flow immediately — or about as full as it gets, anyway (techno really isn’t known for its fullness). But they do pretty often also build the song up somewhat gradually to help with live mixing, e.g. here subtle bass (re)drop around 1:20, hihats drop at 1:40.

Fwent: A Mystery.

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments