I’ve just discovered Etymologeek, “A free multilingual dictionary that not only shows word histories but also draws them.” From the About page:

Etymologeek shows you the origins of the words you search for. However, next to the textual explanation, we also include an etymology tree (directed graph) to show graphically how the word is derived and to what other words it is related. Moreover, we also aim to include word definitions and other relevant information. […]

Our data is derived from open sources, primarily from the Wiktionary (licensed under the CC BY-SA license) or other public domain etymology data repositories. Much of the data has been automatically extracted: we have used tools such as Etytree by Ester Pantaleo to do that. However, we have also been gradually refining the data, making corrections, modifications, and manually reviewing some of the etymology entries.

I like the answer to “Can I trust your etymologies?”:

No. Etymology is inherently speculative and uncertain. Moreover, some of the automated data extraction we have performed to build Etymologeek has resulted in errors or inaccuracies. We encourage you to independently verify any data you see on our website, and we disclaim any responsibility for your use of or reliance on it. We also encourage you to submit corrections and report mistakes.

The website was built by “Linas, the founder of Interlinear Books.” I discovered it by searching for the etymology of Latvian padome ‘council, board; (historical) soviet’; Wiktionary was no help (“This etymology is missing or incomplete”), but I saw that Google also offered Padome etymology in Latvian |, where I learned that “Latvian word padome comes from Latvian dome,” and clicking on dome “(often plural) council (legislative or administrative organ)” got me the information that it “comes from Proto-Indo-European *dʰeh₁-” along with a nice visualization and a list of related words. I’ll have to start checking it regularly, and I hope it thrives and keeps improving.


  1. David Eddyshaw says

    Equally admirable:

    Q: How do I support Etymologeek?
    A: You can support the project by using it, sharing it with friends, and helping us correct mistakes.

  2. The graphs are a nice way to visualise word evolution, however the automated processes creating them have still a long way to perfection. Looking up “prase” I learned that the PIE root is coming from Proto-Germanic. It also proposed a loop Proto-Slavic > Old East Slavic > Ukrainian > Proto-Slavic.

    Of course, one cannot expect perfection from automated data extraction algorithms. However a lot of these mistakes could be avoided if the author implemented some simple chronological checks prohibiting drawing arrows from later to earlier languages.

  3. Good suggestion!

  4. Since we recently debated whether ‘cock’ (rooster/male domestic fowl) arrived into English direct from Germanic, or via Latin/French — conclusion Old Norse into both English and French in parallel, Latin nothing to do with it — I chased that up.

    Etymologeek alleges PIE *geugh- ‘swelling’; whereas Etymonline says imitative, back to Sanskrit kukkuta.

    Etymologeek also connects ‘kitchen’ “*h₃p-ékʷ-, Proto-Germanic *kukkaz (Cock, rooster, chicken.), Latin cucina” whereas Etymonline traces back Proto-Germanic *kokina … Vulgar Latin *cocina … PIE root *pekw- “to cook, ripen”.

    Their algorithm swept up *kukkaz because sound-alike *kokina ? It would have to be very late and very Vulgar Latin to get influenced by Proto-Germanic.

    I take their point Etymology is inherently speculative and uncertain.; but does We also encourage you to submit corrections and report mistakes. actually mean they’re crowdsourcing (like wiktionary), so entirely unreliable when you get down to the nitty-gritty?

    I’ll stick with Etymonline thank you, and my dead-trees dictionary that I gladly paid good money for.

    For a language I don’t know well (like Latvian) I have no intuition on which to sift out algorithmically-derived nonsense.

  5. Etymologeek alleges PIE *geugh- ‘swelling’;

    Ah, I see other sources [even wiktionary] give that etym for ‘cock’ as in hay-cock and attested in place names, unconnected with domestic fowl. (Hmm hmm or do the fowl roost in the hay? Anyway “unconnected” etymologically.)

    Old Norse kǫkkr (“lump”) not the same as Old Norse kokkr (“cock” … [the fowl].

    So … is Etymologeek aware homonyms are as old as language itself?

  6. David Marjanović says

    Proto-Germanic *kokina

    That was clearly *kukīna, borrowed from cocīna before /o/ had become phonemic in Northwest Germanic*. The reflexes all have [u] or [y] (kitchen < cycene, Küche, Upper German Kuchl and similar).

    *…minus Gutnish, it seems. Y’know, of Gotland.

  7. For a language I don’t know well (like Latvian) I have no intuition on which to sift out algorithmically-derived nonsense.

    Yeah, it’s too bad it’s so unreliable — I hope they fix some of the problems. But you’re right: for the present, better to stick with Etymonline and actual dictionaries.

  8. it’s too bad it’s so unreliable

    It’s almost comical: it’s mixed up ‘reed’ with ‘read’ (present participle); ‘read’ (past participle) with ‘red’ with ‘rid’.

    It knows there’s two senses of ‘cleave’ — but choosing either takes you to only the adhere/stick-to etymon. Then clove of garlic it can only trace back to O.E. clufu.

    (I’m wondering where they’re getting their sources? They seem to have plenty of lewd/sexual slang from the words I’m bumping in to. ‘Clove’ leads to ‘clover’ leads to ‘clover clamp’ — which I didn’t know was a thing. UrbanDictionary?)

  9. January First-of-May says

    Old Norse kǫkkr (“lump”) not the same as Old Norse kokkr (“cock” … [the fowl]

    …though IIRC the word for “cock” the body part originates from the former, at least in Scandinavian. In English it’s usually said to be a metaphor from the fowl somehow.

    (I believe we discussed this relatively recently, but I forgot where.)

  10. David Marjanović says

    (I believe we discussed this relatively recently, but I forgot where.)


  11. It derives “ragazzo” from Arabic through Malayalam to Italian, which is fun

  12. Oy. That is a miscoding. *Ragatius is Medieval Latin, not Malayalam.

  13. *Ragatius is Medieval Latin

    ragazzo (“boy; boyfriend”) from Medieval Latin *ragatius, most probably ultimately from Arabic رَقَّاص‎ (raqqāṣ, “messenger, courier”), or alternatively from Ancient Greek ῥάκος (rhákos, “rag, tatter”, suggesting the clothing). [wiktionary]

    Looking at Wiktionary because the geek’s blurb mentions it as a source, and because many of the geek’s derived words are the same as on Wiktionary — including ragazza squillo, ever-useful for the tourist on the Riviera.

    wiktionary also has Italian ‘raga’: Clipping of ragazzi (“guys”)

    and Italian ‘raga’: raga (melodic mode used in Indian classical music) — also borrowed into English.

    Wiktionary’s ‘raga’ from Sanskrit, meaning both the musical form and ‘dye, colour’ and ‘passion, love, lust’ –presumably cognates.

    ‘raga’ borrowed into many languages in the vicinity including Indonesian/Old Javanese and … Classic Malay.

    Speaking as a geek of algorithms and data science, this seems to be your classic ‘machine learning gone berserk’/ garbage in-garbage out. Sound-alike and worse spelling-alike across multiple languages/writing systems is not how to draw derivations.

    (Of course I’m speculating as to what the algorithm is up to. It might be stupider than I’m imagining.)

  14. Wow. Thanks for doing all that investigation. Too bad it’s not a better site, but I guess we can all be happy with Wiktionary.

  15. Why was *ragatius reconstructed?

    if the only reason was to provide a link between Arabic and Latin, why was one needed?

    A form reconstructed for Sicilian Arabic, which was alive and well in the Emirate of Sicily from the ninth to the thirteenth centuries, would be likelier than one reconstructed for Latin.

  16. ktschwarz says

    The wiktionary entry for ragazzo gives two sources: (which derives it from Arabic via medieval Latin, without indicating any reconstruction: “[dall’arabo raqqāṣ «fattorino, corriere», passato già nel lat. mediev. ragatius e varianti]”) and (which derives it directly from Arabic). There are also some sources indicating that Ragatius existed at least as a personal name in the 1300s, and was the source of modern Italian surnames Regazzo, Regazzetti, etc.

    The etymon *ragatius was added here by user Word dewd544, who is a heavy contributor in Romance etymologies and seems legit, at least at first glance at his conversations. Maybe ask him why he put in the asterisk; it could just be a mistake.

  17. Italian Wiktionary says the same thing, more or less: “con il plurale antiquato ragaceni (1408) o ragazzini (1492), dall’arabo magrebino raqqās ‘corriere, messaggero’, al plurale raqqāsīn, passato già nel latino medievale ragatius e varianti.” Its source is the comprehensive etymological dictionary of Cortelazzo and Zolli, which I haven’t checked myself.

    The Tesoro della lengua Italiana delle Origini gives early attestations of the word in the ‘servant’ sense. For what they are worth, the early attestations suggest a spread from north to south: Tuscany (c. 1280), Venice (c. 1320), Rome (c. 1360), Sicily (c. 1350).

Speak Your Mind