Thesaurus Linguae Sericae.

From the About TLS section [2025: Not archived, but here’s the main page] (I’ve collapsed the one-sentence paragraphs because they annoy me):

TLS explores the conceptual schemes of pre-Buddhist Chinese on the basis of a corpus of translated texts interlinked with an analytic dictionary. Text and dictionary are constantly held up against one another. Our understanding of the texts and the Chinese system of meanings can be refined by through this close confrontation. TLS associates Chinese concepts with concepts from the European antiquity, aiming to make the classical Chinese evidence comparable to that of other cultures. TLS seeks to make more precise the criteria used in translating classical Chinese, through a detailed description of the semantic relations that obtain among Chinese words.

TLS is the first synonym dictionary of classical Chinese in any Western language; it attempts to state as clearly as possible the semantic nuances that distinguish words close in meaning. TLS is the first dictionary which systematically organises the Chinese vocabulary in taxonomic and mereonomic hierarchies, thus exploring the topology of the Chinese mental space. TLS is the first dictionary that systematically registers lexical relations like antonym, converse, epithet, etc., thereby aiming to define the Chinese conceptual space as a relational space. TLS is the first dictionary of Chinese which incorporates detailed syntactic analysis, thus enabling the systematic study of basic phenomena as e.g. the history of abstract nouns in China. TLS is the first corpus-based dictionary which records the history of rhetorical devices in texts, making it possible to study crucial matters such as the history of irony in China.

I guess most people who would be interested already know about it, since it’s been around for a while, but you never know. Lukas Zadrapa, in Word-Class Flexibility in Classical Chinese: Verbal and Adverbial Uses of Nouns (Brill, 2011), calls it “the most extensive and definitely the most sophisticated interactive encyclopedia of (primarily ancient and mediaeval) Chinese language and Chinese concepts at hand, which I have had the chance to exploit since the time almost ten years ago when it was not even remotely accessible via the internet” (pp. 12-13). Kudos to General Editor Christoph Harbsmeier, who apparently got it started, and a tip o’ the Languagehat hat to Trevor for the link.

Update (Oct. 2025). The TLS site is now here.

Iona.

I came across a reference to the island of Iona, looked it up, and in that Wikipedia article discovered some startling information about its name (I’ve bolded the startling part):

The earliest forms of the name enabled place-name scholar William J. Watson to show that the name originally meant something like “yew-place”. The element Ivo-, denoting “yew”, occurs in Ogham inscriptions (Iva-cattos [genitive], Iva-geni [genitive]) and in Gaulish names (Ivo-rix, Ivo-magus) and may form the basis of early Gaelic names like Eogan (ogham: Ivo-genos). It is possible that the name is related to the mythological figure, Fer hÍ mac Eogabail, foster-son of Manannan, the forename meaning “man of the yew”.

Mac an Tàilleir (2003) lists the more recent Gaelic names of Ì, Ì Chaluim Chille and Eilean Idhe noting that the first named is “generally lengthened to avoid confusion” to the second, which means “Calum’s (i.e. in latinised form “Columba’s”) Iona” or “island of Calum’s monastery”. The possible confusion results from “ì”, despite its original etymology, becoming a Gaelic noun (now obsolete) meaning simply “island”. Eilean Idhe means “the isle of Iona”, also known as Ì nam ban bòidheach (“the isle of beautiful women”). The modern English name comes from an 18th-century misreading of yet another variant, Ioua, which was either just Adomnán’s attempt to make the Gaelic name fit Latin grammar or else a genuine derivative from Ivova (“yew place”). Ioua’s change to Iona results from a transcription mistake resulting from the similarity of “n” and “u” in Insular Minuscule.

For some reason (probably having to do with my work as an editor) it irritates me when words get changed in this way, whereas ordinary sound change, even via analogy or folk etymology, doesn’t bother me at all. (See this 2003 post for another example, the verb collimate.)

Lexical Distance Among the Languages of Europe.

Just a map, but a nicely done one, with some interesting discussion (and explanation of obscure abbreviations) in the comments. Thanks, Trevor!

Is Siri Killing the Twang?

Well, not really. And this Guardian piece by Tom Dart should be taken with several grains of salt, like all journalism about language. Still, some interesting stuff there; it starts with an anecdote about a Texan trying to communicate with Siri and failing, and goes on:

The upshot of this brief and decidedly unscientific experiment is that Siri is at her best when addressed in standard English, with accents toned down and slang avoided where possible.

The writer Julia Reed came to a similar conclusion in an essay for the latest issue of the southern lifestyle magazine Garden & Gun, when she turned to dictation apps after breaking her left elbow in New Orleans. She wrote:

Like the iPhone’s highly temperamental Siri, Dragon and the rest of the dictation apps I tried steadfastly refused to understand pretty much everything I had to say. Apparently none of [Dragon’s] coders have spent a natural minute below the Mason-Dixon Line. A smart person could make a lot of money by inventing a Siri for Southerners.

[…] “Most people have what we would call a telephone voice, so they actually change away from their local family accent when they’re speaking on the telephone to somebody they don’t know,” said Alan Black, a Scottish computer scientist who is a professor at the Language Technologies Institute at Carnegie Mellon University in Pittsburgh.

They also have a “machine voice”, he said. “People speak to machines differently than how they speak to people. They move into a different register. If you’re standing next to somebody in an airport or at a bus stop or something, you can typically tell when they’re talking to a machine rather than talking to a person.”

Black speculated that “one of the reasons they designed Siri to be fundamentally a polite, helpful agent who isn’t your friend but works for you, is to encourage people to be somewhat polite and explicit to her, rather than being very colloquial. Because speech recognition is always hard when you drop into colloquialisms.” […]

Black thinks that in coming years, programs such as Siri will go from being aloof in style to more familiar, understanding your language patterns as if they were a close friend rather than a casual acquaintance.

Who knows? The future’s not ours to see; que sera, sera! But it’s fun to think about. (Thanks, Kobi!)

Clinamen.

Via wood s lot, I got to The Poetry of Osip Mandelstam: A Radio Play by Paul Celan (complete), posted by Jerome Rothenberg and “Translated from Celan’s German by Pierre Joris.” Now, I’ve long been a fan of Rothenberg’s (see, e.g., this post), but I found several things about this annoying. In the first place, there is no link to Celan’s German original, or even a title by which it could be googled. (I’ve tried [Celan Mandelstam Hörspiel] and [Celan Mandelstam radio], with no results except this Rothenberg post.) It’s clearly from a German original, because it includes German forms like Pawlowsk (rather than Pavlovsk) and Swesda (for Zvezda, ‘Star’), and those non-Englished forms are another annoyance (see my similar complaint in this ancient post). But the original annoyance, the thing that caught my eye and made me want to find the German original, was “The poem in this case is the poem of the one who knows that he is speaking under the clinamen of his existence…” Clinamen? I have a pretty extensive vocabulary, but that rang no bells except as a Latin term I was vaguely aware of (turns out it’s, in Wikipedia’s words, “the Latin name Lucretius gave to the unpredictable swerve of atoms”). The OED’s entry (unrevised since 1889) is short and simple:

An inclination, bias.
1704 Swift Tale of Tub ix. 166 The Round and the Square, would by certain Clinamina, unite in the Notions of Atoms and Void.
1823 T. De Quincey Lett. Young Man in London Mag. July 91/2 An insensible clinamen (to borrow a Lucretian word) prepares the way for it.
1838 J. C. Hare & A. W. Hare Guesses at Truth (ed. 2) 1st Ser. 296 No old word; which, with a slight clinamen be given to its meaning, will answer the purpose.

I have no idea what would lead someone translating from German to use such an obscure and impenetrable word unless the object is to reproduce a corresponding obscurity in the original, hence my quest. (Of course, even if you substitute “bias” or “inclination,” the phrase “speaking under the clinamen of his existence” is still impenetrable.) At any rate, if anyone knows what Celan wrote, or has things to say about the word clinamen, the comment box awaits.

Labels as Ideograms.

Another interesting passage from Franklin’s Writing, Society and Culture in Early Rus (see this post):

Turning to the name-labels on portraits [in a Kiev church], we find that the significance of the pictorial, of the visual, is if anything even more pronounced. The first impression is of a kind of linguistic anarchy. The names themselves were written in Greek, or in Slavonic, or in their Greek forms using Cyrillic letters, or in any number of hybrid combinations. The reasons vary from object to object, and may include ignorance, miscopying, hypercorrection, perhaps mere aesthetic preference, but the variability of forms is not necessarily a barrier to the verbal reading of the labels. The more revealing elements, paradoxically, are those which are relatively stable: the standard abbreviations labelling Christ and the Mother of God, the epithet for ‘saint’ or ‘holy’, the legend ‘Jesus Christ is victorious’ which accompanies the image of the Cross. Even where the standard language of the graphic environment is Slavonic, these standard forms remain consistently Greek: (ΙϹ ΧϹ; ΜΡ ΘΥ; Ο ΑΓΙΟϹ or the monogram of an alpha within an omicron; ΝΙ ΚΑ). Thus the most common of all image-labels retain their nonnative forms. By verbal logic the words signified by these abbreviations ought to have been appropriated into native usage, along with large numbers of other words and terms specific to the imported faith. Instead, these graphic formulae in effect cease to function as alphabetic script and turn into ideograms. A wholly unscientific survey in a modern church shows that a large proportion of viewers have no difficulty affirming that the graphic sign ΜΡ ΘΥ ‘means’ Mother of God, but that very few could decode it as an abbreviation of the Greek Meter Theou. Familiarity cancels out difficulty. Images of Christ and the Mother of God were familiar enough to be recognised along with their appropriate graphic emblems of identity, and the gap in perception between our notional ‘lettered’ and ‘unlettered’ viewers is diminished almost to nothing, since both might ‘read’ these inscriptions in the same way. The correct writing is that which is correct as part of the picture, not necessarily that which gives the ‘correct’ letters for the words as articulated in the native language.

I’m fascinated by these situations on the margins of literacy, and how people interpret signs in different ways.

Aphercotropism.

HaggardHawks (“Words, language, & etymology”) posted last July about a word that is just barely hanging on at the fringe of the English wordhoard: aphercotropism. As many readers pointed out when HaggardHawks first tweeted about it, it’s not in the dictionaries, but neither did HaggardHawks make it up — its first appearance seems to be in this 1899 note in The Selborne Magazine, where it is defined as “Turning away from an obstruction.” Since then it has failed to catch on to such an extent that it is not even in the Third Edition of the OED. HaggardHawks explains its etymology thus:

First of all, the prefix aph– derives from a Greek word, apo, meaning “off” or “away from”. It’s the same root we see in words like apocalypse (which literally means “uncovered” or “disclosed”), apocryphal (literally “hidden away”), and even apology, which originally referred to a formal defence or justification, or to a personal account of a story (and so literally means “from speech”).

Secondly, the –erco– part comes from another Greek word, erkos, referring to a fence, a barrier, or a some kind surrounding wall. It only has a handful of offspring in modern English, the majority of which are fairly obscure, long-forgotten terms (the kind that HaggardHawks devours) that have found their way into the dustier corners of the OED: hercotectonic (“pertaining to the construction of walls”), poliorcetic (“relating to the besieging of cities”), and hercogamous, a botanical term describing plants that grow “barriers” between their male and female parts in order to prevent self-fertilization. Apparently.

So that only leaves the suffix –tropism, which you’ll likely recognise from words like heliotropism (“turning towards the sun”) and phototropism (“growth towards a light source”).

While I admire the pedagogical spirit and lively style, I can’t help but feel a site focused on words and etymology should have done a better job. There is no “prefix aph–,” there is a prefix ap–, which indeed derives from apo. When you put it in front of a morpheme beginning with h–, naturally you wind up with aph–. The problem following on from this is “another Greek word, erkos”; as can be seen from the examples adduced, the word is herkos (ἕρκος) with an h– (reflected in spelling by a rough breathing). One can and should be aware of this stuff even without knowing Greek; studying the etymologies in any good dictionary should do the trick. [N.b.: The post has been amended to correct the error; thanks for the heads-up, Suse!] Still, I’m glad to know about HaggardHawks (via MetaFilter), and I recommend checking it out — from this entry I learned that the planet Uranus was originally going to be called George.

Introducing CHIRILA.

Anyone interested in Australian linguistics will be gratified by this Anggarrgoon post:

I am very pleased to announce that the first phase of CHIRILA (Contemporary and Historical Resources for the Indigenous Languages of Australia) has been released. This represents approximately 180,000 words from 155 different Australian languages. It is a subset of the full database (of approx 780,000 items); eventually I hope to be able to release most of the data. Currently, the first phase is that for which we have explicit permission, or which is already in the public domain.

The material is hosted at pamanyungan.net/chirila; please see the web site for more information about the contents of the database, how to download data, what formats are available, and the like. We do not provide a web interface to the data; you download it and use excel or a database program to read the files. We hope the data will be useful to researchers, community members, and others with an interest in Australia’s Indigenous language heritage. pamanyungan.net/chirila also includes access to the preprint of a paper describing the database (both the online and full versions).

I’m not sure why they don’t provide a web interface, but I imagine there are good and sufficient reasons.

Franks.

Another good passage from Bartlett’s The Making of Europe (see this post), this time excerpts from the section titled “Naming” (pp. 101-105):

The final gift of conquest to the western European aristocracy was a name. For it was in the process of the dramatic expansionary enterprises of the eleventh, twelfth and thirteenth centuries that a shorthand term was popularized that had the connotation of ‘aggressive westerner’. That term was ‘Frank’.

[Bartlett gives examples of the use of the term to cover various groups — German, Flemish, Norman, etc. — from “the De expugnatione Lyxbonensi, a rousing account of the capture of Lisbon in 1147 by a crusading army of seamen and pirates from north-west Europe,” and a text by Affonso I of Portugal, who referred to “the agreement between me and the Franks.”]

Thus there are two closely related circumstances in which the general label ‘Frank’ was convenient. One was when a member of a body composed of various ethnic groups from western Europe wished to employ a label for the whole of this body; another when someone who conceived of himself as outside of that body […] wanted to give a group name to the foreigners. Thus both as self-appellation and as designation by others, ‘Frank’ was associated with the ‘Frank away from home’. […]

The classic enterprise which stimulated the use of this term was the crusade, the ‘Deeds of the Franks’ as its earliest chronicler called it, and it seems to have been the First Crusade that gave the term a general currency. Prior to that period, of course, it already had a long history, first as an ethnic designation, later in association with a particular polity, the ‘realm of the Franks’ (regnum Francorum). The generalization of the name to cover all westerners was a fairly natural result of the virtual equivalence of the Carolingian empire and the Christian West in the ninth century and, also logically enough, seems to have been used in this way first by non-westerners. The Muslims denominated the inhabitants of western Europe Faranǧa or Ifranǧa. […]

It seems to be the case that the vast and polyglot armies of the First Crusade picked up the term ‘Frank’ as a self-appellation from the non-westerners who already employed it in this general way. Eleventh-century Byzantine writers customarily referred to Norman mercenaries as ‘Franks’, and there was a natural case for applying the name to the western knights, including Normans, who arrived in Constantinople in 1096. The Muslims used the term so generally that Sigurd I of Norway, who came to the Holy Land in 1110, could be described as ‘a Frankish king’. […]

The Celtic world also felt the impact of the Franks. Welsh chroniclers refer to the incursions of Franci or Freinc from the late eleventh to the early thirteenth centuries, and the Anglo-Norman enterprise in Ireland was, as we have seen, termed ‘the coming of the Franks’ (adventus Francorum).

For the rulers of the Celtic lands the Franks were not only rivals to be confronted but also models to be emulated. The O’Briens of Munster expressed their claim to dynastic supremacy by calling themselves ‘the Franks of Ireland’. In Scotland the name had a similar resonance. […] ‘The more recent kings of the Scots,’ observed one early-thirteenth-century chronicler, ‘regard themselves as Franks (Franci) in stock, manners, language and style, they have pushed the Scots down into slavery and admit only Franks into their household and service.’ In the twelfth and thirteenth centuries to be a Frank implied modernity and power.

The term can be found at every edge of Latin Christendom. The trans-Pyrenean settlers who came into the Iberian peninsula in the late eleventh and twelfth centuries were Franks and enjoyed ‘the law of Franks’. […] In eastern Europe immigrant settlements in Silesia, Little Poland and Moravia were endowed with ‘Frankish law’ or might use field measurements ‘of the Frankish type’.

The term ‘Frank’ thus referred to westerners as settlers or on aggressive expeditions far from home. It is hence entirely appropriate that when the Portuguese and Spaniards arrived off the Chinese coasts in the sixteenth century, the local population called them Fo-lang-ki, a name adapted from the Arabic traders’ Faranǧa. Even in eighteenth-century Canton the western barbarian carried the name of his marauding ancestors.

Long ago — before Languagehat, maybe before the turn of the century — I ran across a wonderful list (on a history listserv?) of all the various permutations of this term, right down to the Ferengi of Star Trek, but I’ve never found it since.

Se(d)lo.

I just ran across the archaic Russian (really Church Slavic) phrase крины сельные [kriny sel’nye] ‘lilies of the field’; the ‘lily’ part is straightforward (крин = Greek κρίνoν; the modern Russian word is лилия), but the adjective сельный looks like it should be derived from село [selo] ‘village,’ which is very strange from the semantic point of view. So I looked up село in Vasmer and discovered a simple but instructive explanation: the Russian noun is the result of the falling together in East and South Slavic of two different Slavic words, *selo ‘plowed field’ (cf. Lith. salà ‘island,’ Lat. solum ‘soil’) and *sedlo ‘settlement’ (from PIE *sed- ‘sit’: cf. Goth. sitls ‘seat,’ лат. sella ‘chair’ < *sedlā; West Slavic preserves the -dl-, cf. Czech sídlо ‘settlement’). In Old Russian, село could mean ‘dwelling,’ ‘settlement,’ or ‘field’; it eventually specialized to its modern sense ‘village,’ but the old sense ‘field’ left behind this stranded adjective. (The modern adjective for село is сельский: сельская жизнь ‘village life.’) Note that sound change produced a confusingly multivalent word (the horror! language corruption! degeneration!), but people dealt with it and everything eventually settled down. Sic semper mutatis mutandis.