David Shariatmadari has another interesting linguistics-related piece in the Guardian that begins:
Scientists have just published a startling analysis of commonly used words in 4298 languages (62% of all those spoken). They wanted to find out if there were associations between particular sounds and meanings that couldn’t be put down to the fact that the languages were related, are used close to one another, or to chance.
As it turned out, they detected strong correlations between sounds and meanings that were independent of genetic relationship, borrowing or coincidence. For example, words for “small” often contained high front vowels (roughly, “ee” as in “peak” or “see”); words for “round” and “red” were linked to “r” sounds; words for “star” to “z” and words for “full” to bilabial consonants (“p” and “b”). Associations were found for body parts: “tongue” was correlated with “l” and “nose” with “n”. Remember, these similarities were found in languages as distant from one another as English and Tagalog, Yoruba and Mandarin.
Why does this matter? One of the first things a student of linguistics learns is that the relationship between the signifier (the sound of a word) and the signified (the concept it represents) is arbitrary. We use the word “tree” to signify a plant with a trunk and leaves, but there’s nothing particularly tree-like about the combination “t-r-e-e”. If a law was passed saying we had to call it “frave” instead, that word would gradually become normal, just like “ki” is for Japanese speakers and “umthi” for Xhosa. […]
Damian Blasi and his colleagues focussed on 30 fundamental concepts – none of which represent loud or distinctive noises, often fertile ground for onomatopoeia. These came from the famous “Swadesh list” of 100 basic words, and included “bite”, “drink”, “ear”, “leaf”, “we”, “tooth”, “skin”, “one” and “stone”. Incredibly, as well as positive links, they uncovered sets of sounds these words seem to “avoid” – ones that appear much less often than you would expect if it were down to chance. “Water” (strangely enough for English speakers) seems to avoid the “t” sound. Words for “tooth” avoid “b” and “m”. The “a” “h” and “r” sounds are found less commonly in words for “breasts”.
The study builds on earlier research which hinted at non-arbitrary relationships between sound and meaning. For example, people have been able to successfully pair up words that have opposite meanings in languages they don’t know. One study showed that English speakers could make better-than-chance guesses at the concreteness of unfamiliar foreign words – that’s to say, whether a word might mean something like “car” versus something like “happiness”. Intimations, if you like, of a universal language of sound.
I don’t know how seriously to take any of this, but it’s certainly food for thought. Thanks, Trevor!
The “avoiding” stuff rings alarm bells straight away. Just a couple of days ago I was presenting a talk to some of our trainees about statistical traps in research; a very important one is that the more correlations you look for simultaneously in a body of data, the more you will find – by pure chance. Including *absences* like this must sure greatly increase the chance of purely accidental correlations. This is particularly dangerous in cases where you find your “correlations” retrospectively, *after* you’ve run your experiment; the *only* way to avoid this is to specify just what correlations you’re hypothesising *beforehand.*
That kind of meta-statistical design fault vitiates a horribly large number of published papers in many fields. Peer review is, alas, no guarantee that such errors will be avoided.
Looking for multiple correlations retrospectively in a set of just 100 words? Really?
You would have hoped that a “Discrete Biomathematics Group” would be aware of such traps, but frankly many others who ought to have known better have not.
I may well be talking nonsense, but I can’t see their actual data at present, and it’s bed time. Hope to be proven wrong.
I must also concede that some of this is pretty well known and uncontroversial anyway, of course. Edward Sapir himself published on the low-vowel high-vowel big/little thing many decades ago. The range of words which are more similar across presumably unrelated languages than can be accounted for by pure chance is assuredly much greater than obviously onomatopoeic words. We ourselves collected quite a few “butterfly” words a while back … Kin-Dza-Dza! And “tongue” words really do more often contain l-sounds than would be accounted for by chance – though I can’t say that astonishes me.
As it happens I can think of quite a few languages with “t” in the word for “water” (in Athabaskan for example) but individual counterexamples wouldn’t suffice to invalidate the general thesis, of course.
“For example, words for “small” often contained high front vowels (roughly, “ee” as in “peak” or “see”)”
The failure to provide “teeny” as an example is stunning.
“words for “star” to “z”
I know a language whose word for “star” has TWO Zs. By the way, in the Novgorodian birch bark letters it was written as gvezda.
“Words for “tooth” avoid “b” and “m”.
And teeth aren’t involved in their production.
“Associations were found for body parts: “tongue” was correlated with “l” and “nose” with “n”.
Same thing: a person whose tongue was removed for some offense, lying for example, would have trouble producing the l sound. And n is nasal.
I first learned about this phenomenon from this Wiki page about 10 years ago:
https://en.wikipedia.org/wiki/Bouba/kiki_effect
I suspect that there’s something to it.
This study’s originality is in the very large number of languages considered, but the research topic itself is hardly new. Some of the common sound-meaning correlations have long been known, such as that between high front vowels (¨[i, e]) and smallness, and the corresponding association between low back vowels ([a, o|) and bigness. But counterexamples like English small and big demonstrate that this correlation is far from universal. The existence both of such correlations and of counterexamples justifying the concept of l’arbitraire du signe show that statistical significance does not mean generality.
The problem is that the known history of languages, and even the partial reconstruction of the unattested ancestors of some of them, refer to a very short period in human evolution: no known language, however old, is close to a (hypothesized) universal proto-language.
It is very likely that associations of some language sounds (and the sensations experienced during their production) are extremely old, and still valid today, but sound production has a way of evolving through time, depending on a number of circumstances, so that originally meaningful sounds can be transformed out of all recognition in the course of millennia. An example is the unobvious but demonstrated relationship between English and German warm and Greek therm-, where the sounds w and th have nothing in common but derive from an earlier *gw (reconstructed not just from these languages but from a number of other “sisters”).
Concurrently, the meanings of words containing even the most sensation-linked sounds also tend to change through time, also depending on a number of circumstances. Most words evolve from basic roots which get worked on in various ways, notably through the addition of extra material such as prefixes and suffixes. Once a word has been created in that way, it is on its own, and its meaning end up having nothing more to do with the meaning of its root. Finally, some words get lost, usually through lifestyle changes which make them no longer relevant or even fashionable, in which case a different word will be created within the language or borrowed from another language, in which the sound-meaning correlations used may be different.
So what about small and big? Without going into too much detail, small has the same origin as German schmal ‘narrow’, a related concept, and probably also smile (the lips forming a narrow slit). As for big, it seems to be related to bag, the common meaning perhaps related to fullness rather than just size. (Some hatters will know more about the details than I do).
(Sorry for some of the typos and a few missing words – It took me a while to write and edit this post and I ran out of time for corrections)
About warm and therm-, I think the reconstructed initial is *gwh not *gw.
@David: I looked at the paper. It does look like they take the possibility of statistical artifacts seriously. The method they use to deal with this involves plugging it into some R package that purports to estimate the number of false vs. true positives. I don’t understand it and I’m not necessarily convinced by the analysis. The fact that they find the most associations for what are probably the shortest words (you and I) makes me suspicious.
most associations for what are probably the shortest words (you and I)
In which case, with mitian, ŋinian, ŋiŋian and nimian languages there, there isn’t much left to push the statistics back to apparent randomness.
I think I remember seeing discussion on this earlier–it looks like it was “Published online before print September 12, 2016.” Here is a Language Log post that mentions it: http://languagelog.ldc.upenn.edu/nll/?p=28328
This brings to mind the relationship between language and music.
There are connections between musical structures and emotion,
music is useful in language acquisition, and the same area of the brain is used in comprehending musical structure and language so it is not startling at all to find some connection between sound and meaning on a pre-compositional level. But any attempt to draw grand conclusions from this strikes me as philosophically naive.
Sound change would not necessarily erode the putative sound-sense relationships because new linguistic signs are constantly being created in various ways (semantic drift, neologism, borrowing…); not all are equally successful as replicators, though, and maybe sound symbolism can act as a selective force. If the alleged correlations are valid then some such story must account for them.
(Warm : therm-: yes, PIE *gʷʰ is reconstructed, but there’s actually some uncertainty about this cognate pair as *gʷʰ- has also been argued to give Germanic b-.)
Let’s start with a basic fact that 36 language families account for 95% of all languages of the world.
And language families by definition include related languages and these tend to have quite similar words in the Swodesh list.
For example, there SHOULD be statistical correlations between 1,532 languages in Niger-Congo family (that’s about 35% of the sample), simply because they are related and have similar words in Swodesh list. In fact, if the study didn’t find any such correlation it would mean that the Niger-Congo family doesn’t exist!
So the large number of languages in the sample is actually bad. To avoid mistakes caused by this problem, we actually should remove all related languages from the sample, ie, include only one language from each language family (or maybe even one language from each macrofamily).
The sample would be reduced then to, perhaps, 200 or so languages from independent language families and isolates.
The sample would be reduced then to, perhaps, 200 or so languages from independent language families and isolates.
I suspect, however, that perhaps as much as half of them would come from New Guinea, assorted nearby islands, and (also nearby) northern Australia, which might well contaminate the sample by actual associations that might happen to actually exist in that particular area (but be completely absent from the rest of the world).
Meanwhile, the extremely distinct (and numerous) Austronesian languages might as well have been taken from the sample entirely.
@SFReader, are you engaging with the specific methodology they used to deal with the issue you raise? Or saying generally you’d be more convinced by a paper that tried not to need to be aware of that?
“ie, include only one language from each language family (or maybe even one language from each macrofamily).”
That “or” suggests a flaw in this approach, does it not? Unless you filter every trace of ancestral relationship out of your data set, you might do better to include it in your methodology.
That “or” suggests a flaw in this approach, does it not? Unless you filter every trace of ancestral relationship out of your data set, you might do better to include it in your methodology.
The problem is that there’s no sensible definition of “macrofamily” (and some of the existing definitions will probably get you a total sample of something on the order of two dozen languages – maybe three dozen including the famous isolates like Basque – and might well still be biased in favor of the New Guinea area).
EDIT: actually, with the possible exception of Afro-Asiatic and maybe Austronesian, I can’t think of any generally accepted macrofamily. Most “macrofamilies” are proposals – many of them incompatible with each other.
@F:
Interesting. Thanks.
I can’t get at the actual paper; behind a paywall.
SFReader’s point about cognates strikes me as very pertinent.
And finding purported significance mainly in the shortest words is a worrying symptom. Lyle Campbell has (of course) been all over this in the context of Greenberg’s imaginary Amerind, including specifically the question of pronouns and words like ‘nose’ and ‘tongue.’ While in one sense that supports the paper’s general thesis, it surely undermines the claim to have demonstrated anything new or noteworthy.
This business of counting *absences* as correlations seems very suspect to me. It surely means awarding yourself dozens of extra chances for a ‘match’ in every single case. If this was done by looking for ‘matches’ retrospectively it’s statistically ludicrous. No amount of R packages could compensate for such a radical design error, if that is in fact how they worked. It would be astonishing *not* to come up with some apparent correlations with a methodology like that, even if the data were actually random.
Is English alone in having vowel shifts? A close vowel/small size relation should come round every 1000 years or so.
‘“Water” (strangely enough for English speakers) seems to avoid the “t” sound.’
Apart from those who say “waʔer”?
🙂
I have institutional access to PNAS and will download the paper later today.
Corrections for multiple testing of the same hypothesis have long been available; I’d be pretty surprised if none of them were implemented. The one I think I understand is very simple: in the situation depicted in this XKCD, you’re testing for the same thing 20 times, so you divide your cutoff for the p value (usually chosen as 0.05 by tradition) by 20, and you declare a result significant only if its p value is below this new cutoff value of 0.0025.
Huh. Is [z] even common enough worldwide that such an association can be discovered?
Finally, the truth about the High German Consonant Shift. 😉
(Now exacerbated by Trump’s big-league [ˈbig̚lig̚].)
Etymologically, big is a n-stem nickname, and what it’s derived from remains mysterious. Just like pig actually.
It seems to give both under different conditions which have been hard to figure out. I have a paper on this that I’ll look up later.
In biology there are ways of separating correlations like the ones sought here from phylogenetic signal. They require well-resolved trees with branch lengths, though, so they may be hard to implement in linguistics…
By no means.
“Is English alone in having vowel shifts? ”
In Maghribi Arabic original a: has become e: or even i:
Perhaps North Africa seemed a bit poky after the wide open spaces of Arabia.
The Spanish ‘olé’ is apparently from Arabic ‘walla:h’ ‘by God!’
In no particular order:
Jelly beans and correlations.
Beyond English, we have the Cantonese vowel shift, the (now rejected) Korean vowel shift, the Menominee (or Central Algonquian) vowel shift, the Nordic vowel shift, etc. etc. Indeed, chain shifts are more likely to involve vowels rather than consonants.
In general, a macrofamily is a set of well-established families that seem to be related, but where the relationships have not been worked out in detail. There is a cline between the two terms: where the membership of the macrofamily is known beyond doubt, as in Indo-European, we just call it a family.
English CVC words where both Cs are voiced stops generally have no etymology, or in the case of dog an unusual one.
For paywalls, append “sci-hub.cc” to the domain name, solve an easy CAPTCHA, and Bob’s your uncle.
/z/ in ‘star’ words: the Haitian Creole for ‘star(s)’ is zetwa < les etoiles (stars being more often referred to in the plural), so etymologically the /z/ comes from the incorporated French article.
Polynesian consonant chain shift.
bed, god.
One question is what mechanism would create the associations between meaning and sound? If a language has a word that doesn’t correspond to or even contradicts those associations (e.g. due to sound changes), would it then be more open for replacements that do correspond to the associations? Or are words that correspond to the associations less likely to be replaced? I don’t know whether we have sufficient data on the replacement of words to test that.
As far as the sounds in words for “breast” are concerned, it seems possible that words for things important to babies would disproportionately have the sounds and syllables used earliest by babbling infants, with parents eager to interpret random reduplications of /ma/ /pa/ /ka/ /ga/ etc. as meaning mother, father, breast, milk, up, more, feces, and so on (e.g. English “mama,” Latin “mamma”). That might be one reason for the avoidance of “h” and “r,” if there’s anything there to explain that’s not an artifact of looking for correlations after the fact, though it would make the avoidance of “a” more surprising.
JC: chain shifts are more likely to involve vowels rather than consonants.
The production of individual consonants involves some interruption of the air flow coming from the lungs to the mouth and nose, while the production of vowels involves the modification of the shape of the mouth cavity by the placement of the tongue and lips and the motion of the lower jaw, without any interruption of the air flow. As a result, specific vowels are only arbitrary stopping points along what could be a continuum of sound. For example, saying meow to imitate a cat involves a vocalic continuum through several mouth positions, similar to the colour continuum observed in a rainbow, where there are observable peaks of colour but no specific transitions between one colour and the next. Given these physical features, it is very easy to shift a vowel just a little bit off from the desired point. If more and more speakers do so, and the distance between two vowel points becomes more and more reduced, there could be a merger of that vowel with its nearest neighbour, or, if vowels must remain distinct for the sake of intelligibility of the words in which they occur, the shift of one can cause its neighbour(s) to shift also, possibly resulting in a “chain shift” such as the English Vowel Shift, a double shift (affecting the front and the back vowels separately) which preserved most of the pre-shift vowel distinctions but with vowel positions shifted.
The case of consonants is different, since there are a number of physical constraints on where (lips, tongue contact, etc) and in what manner (energy, complexity, involvement of the nose, of the vocal cords or pharynx) the interruption of the air flow takes place. There are many ways for consonants to change, more than for vowels. Consonantal chain shifts (such as the Germanic ones) involve manner rather than place.
I’ve read the paper and the supp. inf.; basically, all potential sources of error mentioned above – and then some – are accounted for in ways that seem adequate.
The one potential point of criticism I can see is the tests for areal effects involving very large linguistic areas – all of mainland Eurasia, all of Africa from Cape to Cairo – which should perhaps have been defined differently, but there’s probably no objective way to do that, at least not easily.
Creoles (and pidgins) are explicitly excluded from consideration.
Likely both?
That could be accounted for by non-baby words like tit and boob.
Except in exceptional cases like Polynesia (mainstream Hawaiian /k/, /ʔ/ < */t/, */k/).
Erik M: As far as the sounds in words for “breast” are concerned, it seems possible that words for things important to babies would disproportionately have the sounds and syllables used earliest by babbling infants, with parents eager to interpret random reduplications of /ma/ /pa/ /ka/ /ga/ etc. as meaning mother, father, breast, milk, up, more, feces, and so on (e.g. English “mama,” Latin “mamma”).
This is true in principle as far as baby talk is concerned (the register used by adults to talk to babies), and one interesting characteristic of baby talk is that it does not evolve much at all: similar CV or CVCV sequences occur in a vast variety of languages, while the sounds of adult words derived from the same roots have evolved in exactly the same manner as those of any other words in the relevant languages. A striking example is caca, a baby talk word used in Latin and still used by all its descendants, including French. From this baby word Latin had derived a verb cacare. This ancestral verb is still recognizable in Spanish cagar but its French “cousin” is chier (currently a vulgar word), where the initial Latin sequence c [k] has become the sound now written ch as in a large number of French words, such as château (lat. castellum) ‘castle’, cheval (lat. caballus) ‘horse’, chien (lat. canis) ‘dog’, and many others, according to a regular evolution from Latin into French.
Actually, I have an example of this.
Once upon a time, when adults had potties under their beds, those potties were boat-shaped (or so the story goes). Latin-speaking students called them navicula (“boat”), and when they needed to use them, they announced naviculare necesse est, literally “shipping is necessary”. This “shipping” was eventually rendered in German as schiffen, and one of my sisters likes to use that word when she has need of a restroom.
The other sister regularly tells her to stop using that word – not because of any dislike for dysphemisms, far from it, but because she interprets it as referring specifically to male-style urination: surely [ʃ] and [fː] are meant to depict the sound a standing man makes?
Now that I think of it, other people may have had the same idea. The Classical Viennese song about the awesomeness of skiing (especially while drunk) has been parodied (alas, I can’t find the parody) by replacing schifahren and vorstellen in “skiing is the most awesome thing you can imagine” by schiffen and hinstellen in “pissing is the most awesome thing if you can just stand there”, sung (like the original) by a man.
A particularly nice instance is Welsh, which has gone around the cycle twice: the inherited Indo-European words are lost, and the baby-talk words mam and tad are the formal words for ‘father’ and ‘mother’. New baby-talk words have consequently been created: mama and dada.
Modern Khalkha Mongolian has undergone a remarkable vowel “rotation”, which has turned an original front vs back vowel opposition into an ATR contrast.
https://en.wikipedia.org/wiki/Mongolian_language
Words originating from onomatopoeias similarly often end up changing with the language even as the original onomatopoeias remain the same.
Two nice examples I can think of are Russian мычать and (somewhat obsolete) English mew (now probably best known as the Pokemon name).
There’s also Erasmus’ famous point that sheep in Ancient Greece presumably did not actually go [vi].
Modern Khalkha Mongolian has undergone a remarkable vowel “rotation”
Which, according to a semi-recent thesis (Seongyeon Ko, 2012: Tongue root harmony and vowel contrast in Northeast Asian languages), did not happen either — though several other Mongolic varieties, as well as likely the entirety of the Turkic family, underwent instead the opposite rotation. Perhaps this is what John also refers to.
For less disputed examples of interesting vowel chainshifts, there are still lots of options though, e.g. Ruykuyan, Samic, Slavic, Koine Greek. Arguably French, though most of what went down there has been conditional changes (mare > mer, but /a/ still remains in e.g. chat).
— One day I’d actually like to see a phonaesthetics study to actually survey the historical angle. If given a list of statistically predicted sound-meaning associations, then given any sufficiently major historical tumult in phonology, we would expect some amount of exterior developments to also happen, to enforce the “real” associations. Words whose favored associations get broken might end up themselves lost, replaced by more mellifluous newcomers; or maybe irregular sound changes will kick in to save the day…?
@j:
Very interesting Seongyeon thesis. Thanks!
I suppose that the once mainstream idea of Ural-Altaic (still learnt as fact by ordinary Turks, I’ve discovered) would itself tend to produce a misinterpretation of “Altaic” vowel contrast sets in terms of the more familiar (to Europeans) palatal contrasts of Finnish and Hungarian, especially given the pivotal role that vowel harmony played as evidence for the grouping. And after “Ural-Altaic” fell apart, the Altaic hypothesis would still readily lead to Mongolian being misinterpreted in terms of the less-exotic-to-Europeans Turkish. Perhaps ATR harmony systems would have been discovered earlier otherwise. Mind you, Nigeria probably counts as less exotic than Mongolia from a Western European standpoint anyway.
In passing, I notice that Seongyeon disavows “Altaic” as a genetic group at the outset, but does smuggle it in again at the end (including Korean.) This means he has to assume that Turkic has changed from an ATR system to a palatal contrast system, which would presumably be a redundant hypothesis for disbelievers in Altaic. Or would there be a problem with Turkic loans in Mongolian/Mongolian loans in Turkic otherwise?
Huh. I have enough to read for at least the rest of the week now. 🙂
(I see I have carelessly called Dr Ko by his personal name alone, with wholly reprehensible over-familiarity. No attempt to scrape an acquaintance was intended. Ignorance, sheer ignorance.)
I had misremembered: the following paper regards *b as irregular, just like the *f in five, fifth, wolf and perhaps oven. It does, however, propose a very simple rule for explaining when *gʷʰ gives *g and when it gives *w, based on 20 examples: it’s *g before the stressed vowel, *w elsewhere, regardless of word-initial or medial position.
K. T. Witczak (2012): IE *ghʷ in Germanic. A supplement to Verner’s law. 133–142 in Н. Н. Казанский (отв. редактор): ИНДОЕВРОПЕЙСКОЕ ЯЗЫКОЗНАНИЕ И КЛАССИЧЕСКАЯ ФИЛОЛОГИЯ-XVI (чтения памяти И. М. Тронского). Материалы Международной конференции, проходившей 18–20 июня 2012 г. Наука.
PDF with 920 pages (!!!) here, I think.
Unrelatedly, I have a comment from 2:56 pm in moderation; don’t overlook it once it comes out. 🙂
@Eli Nelson Here is a Language Log post that mentions it:
And that links to a LLog posting a few days earlier (Victor Mair) http://languagelog.ldc.upenn.edu/nll/?p=28143 — in which Hat makes couple of comments.
I also vaguely recall a paper by Mark Liberman, commenting on the likelihood of “false cognates”: words in different languages having vaguely similar meaning and vaguely similar pronounciation, but no actual etymological connection. This was to counter some of the crackpot theories about universal language, like Edo Nyland.
If there’s some intrinsic connection between sound and meaning, we’d expect to see a correlation if we took (say) Latin as she was spoke 2,000 years ago with a SInitic topolect spoken 1,000 years ago with a native American language spoken today with an Austronesian language. Vowel shifts shouldn’t matter.
Vowel-shifted words that end up with the ‘wrong’ vowel for their meaning should die out and get suppleted. (And/or perhaps their meaning will change, so sucking a new word into the the semantic ‘gap’.)
DM’s earlier comment is now out of durance vile.
Slightly off-topic, though, related: many years ago I came across a paper, I think in “Language”, documenting the systematic replacement of perfectly innocent words in Mandarin as, over the centuries, phonetic attrition caused them to become homophonous with obscenities.
Can’t remember the title or author, unfortunately.
marie-lucie says: the sounds of adult words derived from the same roots have evolved in exactly the same manner as those of any other words in the relevant languages.
Example in Croatian: “otac” (= father) apparently derives from a proto-Slavic form attiku, or attiki, where “atta” is the baby talk for father. Apparently, “atta” is the same as in Gothic and in the name Attila. The ending -ik represents a diminutive suffix which underwent palatalisation k -> c [ts].
The Croatian equivalent of “dad” or “daddy” is “tata”. In some dialects this is “ćaća” – perhaps indicating a palatalisation of t to t͡ɕ.
The baby version of alveolar consonants ([t d n l]…) are palatal ones ([c ɟ ɲ ʎ]…), and the closest that Croatian has to [c] happens to be [t͡ɕ].
See also: Russian дядя /dʲadʲa/ “uncle” and тётя /tʲotʲa/ “aunt”, where [dʲ] and [tʲ] are even closer to [ɟ] and [c].
Basque forms diminutives by turning alveolar consonants into palatal ones! Two of them, [c] tt and [ɟ] dd, don’t seem to occur anywhere else in the language.
If you don’t want to read the whole thesis, read this 21-page paper from 2011, which explains the idea very well and attributes it to this book chapter:
BTW, Moscow School Proto-Altaic has no vowel harmony at all, but its authors don’t seem to have considered tongue-root harmony in the first place.
I’ve read the whole thing now. Totally worth it. 🙂 It’s not “several other Mongolic varieties”, just Oirat including Kalmyk.
I think it’s the other way around: Mongolia doesn’t count as exotic enough to have something as intrinsically African as tongue-root harmony. 🙂
Interestingly, Ko argues with good support that in all the Altaic cases, retracted tongue root is the marked feature, while advanced tongue root is in the African cases. That’s why, when there’s a neutral vowel, it’s /a/ in Africa but /i/ in Asia: the RTR counterpart of ATR [a] is [ə ~ ɜ], which is difficult to articulate when the tongue is advanced, while the RTR counterpart of ATR [i] is [ɪ], which is difficult to articulate when the tongue is retracted…
non-baby words like tit and boob
Boob is definitely baby-talk; cf. French poupe, German dialectal Bubbi. The older form in English is boobies, pluralia tantum.
Tit is a doublet of teat. The first is OE, the second is French < Germanic, although modern tit may be a reinvention rather than a survival. Again, titties is much older than tits. Etymologically, tits are small (as in the bird names) and boobs are by exclusion big, which agrees with the bouba/kiki rule.
JC: Boob is definitely baby-talk; cf. French poupe
French poupe???? The only (la) poupe I know refers to the stern of a ship. I guess poupe here is not a full word but the stem of la poupée ‘doll (resembling a little girl)’ and le poupon ‘swaddled baby, doll (resembling a swaddled baby)’? Both derived from Latin puppa ‘doll’, as I seem to remember.
About boob, perhaps it went from baby-talk to mom-talk to woman-talk, as it seems much more common in casual women-only conversations than breast, except in the context of feeding babies or of medical problems.
The only (la) poupe I know refers to the stern of a ship.
Same here.
French poupe????
Etymonline defines it as ‘teat’ [nipple], but Wiktionary does not confirm and neither does the TLFI. However, the OED s.v. bubby (the oldest form in English, now obsolete except dialectally) says: “Connection with French poupe ‘teat of an animal’ (formerly also of a woman), Provençal popa, Italian poppa ‘teat’, is very doubtful.” The OED entry is from 1888, so the form may be quite obsolete in French. Wiktionary does list the Italian word, however < Late Latin puppa < Classical pupa, defined as ‘little girl, doll’.
Poop ‘stern’ in various languages is from Latin puppis ‘id.’, whose etymology is not known. In Spanish and Portuguese, though, the word is proa, apparently from confusion between the stern and the prow. “The bowsprit got mixed with the rudder sometimes / A thing, as the Bellman remarked / Of frequent occurrence in tropical climes / When a vessel is, so to speak, snarked.”
Rare even in Middle French, but real as a word for ‘nipple’.
No. pupp “tit, boob; breastmilk”, pupper “tits, boobs”.
Nope. Spanish proa means ‘prow’; ‘stern’ is popa < Latin puppis as above. As far as I know, the case is the same in Portuguese.
Piotr: Rare even in Middle French, but real as a word for ‘nipple’
i think you need the word “there” after “real”, since otherwise your sentence seems to mean that the word is still current with the meaning ‘nipple’.
But thanks for quoting this interesting dictionary, which lists MF ‘scientific’ terms which apparently a complement to the TLFI as it lists words which do not appear in the more comprehensive work.
bed, god
The idea that Germanic words for ‘bed’ are < PIE *bhedh- ‘dig’ is rather shaky: I doubt that speakers of Proto-Germanic were so primitive that their beds were holes in the ground. The connection may come through bed as in flower bed, I suppose.
As for god, it could be from either of the almost-homonyms *ǵʰewH- ‘call’ and *ǵʰew- ‘pour’. While it’s possible to construct semantic bridges from either meaning to ‘god’, it’s still a stretch.
So I stand by my basic claim: English words in B₁VB₂ (where B is a voiced stop and the B’s may or may not be the same) tend not to have sturdy etymologies.