This amazing site calls itself an Etymological Database Project and says:

The main goal of the project is to join efforts in the research of long range connections between established linguistic families of the world. Internet is a brilliant way to combine our attempts and to build up a commonly accessible database of roots, or etyma reconstructed for the World’s major (and minor) linguistic stocks.
Every person or organization interested in this noble task is invited to join.

As regular readers of Languagehat know, I’m deeply skeptical of attempts to connect all the world’s languages into one big happy family, but I have nothing but praise for this project. I haven’t really started investigating it yet, but the idea of a site joining etymological databases of North Caucasian, Sino-Tibetan, Yenisseian, Chinese and Chinese Dialects, Altaic, Chukchee-Kamchatkan, Dravidian, Semitic, Bahnar, North Khoisan, South Khoisan, and Central Khoisan has me salivating. This link is from the invaluable PF, who explains:

Let’s say you know the Tibetan word a~khrog-pa, to roar, rush. You enter in the word here, and you get this page, which gives you the protoform *r[a:]kw|, which meant noise or roar, and words from the same root in Chinese, Burmese, Kachin, Lushei, Lepcha and Kiranti, along with their meanings.
Or let’s say if you know the turkic root *bora-, in the Turkish word for north wind (bora(k)) and the Kazakh word to snow heavily (bora-), you look it up in the the Turkic database, and it gives you the conjectured proto-Altaic root of the word (*po> u, to snow or rain) and the parallel roots in other Altaic language groups. In Mongolian languages, the root is *borug|a, meaning 1) heavy rain or 2) to snow or sleet; In Middle Mongolian the word was boro’an, in Khalka (Mongolian Mongolian) it’s 1) boro:(n) for heavy rain and 2) burgana- for snow or sleet, in Buryat it’s 1) boro: 2) burga- and so on. And you get the same for the Tungus group, for the Turkic group (of course), and for Japanese and Korean! Neat, no?

Neat, yes!


  1. Strange that they chose to use images rather than Unicode for the Chinese …

  2. Not really on topic, but the word “kayak” has been well-established to be a Turkish word brought to the New World by the Inuit. Or maybe an Inuit word transmitted to the Turks before the Inuit migrated to the New World. The word is seen on the Adriatic in the word “caique”, which is a kind of boat (not a kayak though).
    I’ve speculated that the N. Atlantic / Irish words carack, coracle, and cog are related to kayak. Pure speculation but phonetically all words have a k-k pattern, and with a y/r/null substitution you have a pretty close fit (assuming perhaps a ca’ak stage on the way to cog).
    Now when the Norse reached Greenland they met Inuit who had only recently arrived there from the opposite direction. Did the Norse have a word like carrack? If so, a word had travelled half-way around the world in two different directions and met itself.
    Like I said, speculation.

  3. It’s interesting, but this sort of work can be very easily done wrong if attention isn’t paid. Japanese is used to create the proto-Altaic which is then used to prove Japanese is Altaic. Japanese provides data used to reconstruct older forms of Chinese which are then used to inform classical Japanese. Sloppy work becomes circular reasoning. (I mean, there’s an Altaic reconstruction that posits that Japanese had a d->y shift, when the evidence for that is found on an isolated example of a Ryukyu dialect, which doesn’t seem to make sense as far as distribution goes: a y->d shift in that dialect seems simpler. I can’t speak for an earlier shift of d-> y, of course, and I’m merely a student of these things. But all this has made me wary of the proto-reconstructions, very wary.)
    … My piece having been said, it’s kind of neat to look at, and have reconstructions convenient at hand, no?

  4. Japanese is used to create the proto-Altaic which is then used to prove Japanese is Altaic.
    I had wondered about that. I don’t know the first thing about Japanese nor historical linguistics but the few Japanese and Korean roots I looked at seemed, if I squinted, to resemble the other roots in the Altaic group, which excited me.
    Here’s a Ural-Altaic-Sumerian-Dravidian comparison list, and here’s Peter Chong’s Ural Altaic Etymology Dictionary, which I found at looking for Hungarian, sadly excluded thus far from buiding the Tower of Babel.

  5. buiding
    That’s building. The preview button is there for a reason.

  6. Not all proto-Altaic constructions have Japanese incorporated in the reconstruction, but I believe most do, at least that I’ve seen (and the one I’ve seen that didn’t was soundly attacked in a review, by someone who was of the Japanese is Altaic school). It’s a messy thing to deal with. If you’re not careful, you will end up with a circular reasoning like that above (it’s important to know the details of your reconstruction before using it, is the moral).
    Korean suffered from a massive vocabulary replacement through the Chinese influence (and Japanese vocabulary, also, has a huge Sino-Japanese component); plus, considering the time depth for a proposed Altaic split, which would have had to have taken place before the Indo-European split… I don’t know that the question will ever be settled.
    In the West, Japanese is often thought of as Altaic. The most common view in Japan right now (I believe… I haven’t done a survey, really, but this is what struck me) is the mixed-language hypothesis. Which is an interesting thing in and of itself….
    And then, of course, Ainu.

  7. It would be astonishing if you couldn’t find statistical coincidences between any two languages, such as the word “kayak” (not that I know anything about that particular example). It’s even easier to imagine connections when one doesn’t really understand how a given language and its unique phonology work.
    Attempts to formulate universal proto-languages are generally seen by linguists as amateurish (if well meaning) silliness, with good reason. The rigors of proving any such links (beyond the odd borrowed word) cannot be easily met.

  8. This was all news to me, though probably not to others:
    “I suggest in this paper that the Japanese language was formed from an original ‘immigrant’ or ‘boat people’ language. The Yayoi period was characterized by substantial migration of early peoples to Japan, landing in northern Kyushu from the Asian continent via the Korean peninsula. Kyushu was undoubtedly the original territory of the Yayoi migrants and their languages. These Yayoi migrants established themselves in moated townships of various sizes in the flat coastal areas of northern Kyushu. As agricultural trade and cultural contact increased the language stabilized. However, these Yayoi immigrant communities were sufficiently powerful in number as well as economically and culturally to spread throughout Jomon Japan which was itself composed of languages from the north (Palaeosiberian), the south (Malayo-Polynesian) and the West from China and Korea (Proto-Altaic). The function of this expanding Yayoi language which I shall later describe as a Creole (North-Kyushu Creole) was that of a lingua franca among the various Jomon Japanese absorbing and homogenizing many of the original languages in the regions of Honshu and Kyushu. In particular, the language of the existing Kyushu Jomon inhabitants, a Malayo-Polynesian variety, had a profound effect upon the Yayoi settlers’ language. A creolization continuum therefore developed in which the existing Jomon languages (the vernaculars) stood at various ‘distances’ from the standard (and rapidly standardizing) Yayoi language. Some of the older Jomon languages (Ainu and Ryukyuan) continued to survive but were pushed back to their original entry points in the North (Tohoku and Hokkaido) and the South (Ryukyu islands).”

  9. “It would be astonishing if you couldn’t find statistical coincidences between any two languages…”
    As a non-linguist I would like to ask if there is any quantification of what constitutes coincidence and what evidence of familial ties. I assume that the number of common/similar roots between, say, Sanskrit and Ancient Greek is non-coincidental while an odd common word between i.e. Basque and Tagalog is probable a coincidence. But how many (order of magnitude estimate not a specific number of course) coincidences does it require to legitimately raise the issue of relatedness? Has there ever been any attempt to evaluate the statistical probabilities of “false” common root occurances between (specific) languages?

  10. PF: The Ryukyu languages are pretty clearly related to Japanese (now we get to the language-or-dialect question, which is all politics, really, but). I think that’s pretty much incontrovertible (as I understand it, of course; but having looked at the language with my own eyes, either it borrowed a lot of morphology and vocabulary from Japanese and then split into the various Ryukyu languages).
    I do have to say I love it when people tell us what the Jomon languages were…. I read quite a few books on that when looking into the possible survival of Ainu placenames in Honshu (not that I’m an expert yet, by any means: I had to abandon the project due to lack of ability in Ainu; some of the proposed place names are very… baroque), so it’s not uncommon in the literature these days. I would, however, like to point out that the Yayoi period traditionally starts in 300 BCE, and the first examples of writing produced in Japan, some swords from tombs, is usually dated to the 470s CE, which is an awful long gap in the records. (And that inscription wasn’t even in Japanese, save for some name spelling, but Chinese.) Ainu itself doesn’t get written down until the 16th century, I believe.
    I have to stand with a former professor of mine said on this: while it’s kind of naive to assume that the Jomon population only spoke varieties of Ainu (long migration period, long settlement period)… unless there was a very interesting period of domination in pre-Yayoi Japan, which I won’t rule out just yet. However, what the languages of the entire Jomon population were isn’t something that will ever be provable, I believe.
    (The reason to associate Jomon with Ainu mainly has to do with two things: First, that the Jomon culture survived in the northern reaches, moving into Epi-Jomon, with Satsumon and Okhotsk after that, and I’m told that they’re a continuation with continental influences, but I’m not an expert on that, by any means. Second, the case of the “northern barbarians” for the early Japanese state, named the Emishi, which was later read Ezo, which was the older name for Hokkaido. The Emishi were said to have an incomprehensible tongue… but then, so were the peasants in the Tale of Genji–personally, I don’t think that’s necessarily enough to prove anything. What’s more convincing is records of interpreters in dealing with (some of) the Emishi However, it is said that Ainu words linger in some specific hunter terminology in the Tohoku, which is to say northern Honshu, regions, and in placenames; it was the Emishi connection that got me started on the placename research a little. Placenames, however, are very tricky to study for a number of reasons, and between that and a lack of Ainu knowledge, and a lack of time!, meant that I had to abandon the project for the time being.) I am under the impression that the current running theory is that the Emishi were a mix of languages and ethnicities (although, is it an ethnicity if it isn’t recognized? Well, anyway.)
    If you’re interested in the Japanese linguistic situation, Languages of Japan is a very good book. If you’re interested in the Japanese ethnic situation in its early history, Ruins of Identity by Mark Hudson, and parts of To the Ends of Japan by Bruce Batten would be worth reading.

  11. PF, kristine: Thanks for the fantastic comments! If I keep this blog up long enough, I may actually learn something.
    Zizka: the word “kayak” has been well-established to be a Turkish word brought to the New World by the Inuit. Or maybe an Inuit word transmitted to the Turks before the Inuit migrated to the New World.
    By “well established” I assume you mean “hypothesized.” I’m afraid that like many other people you vastly underestimate the degree of coincidental resemblances out there. English bad has almost identical meaning and pronunciation to Persian bad, yet there is no historical connection, either through etymology or borrowing. Look at any amateur language website (like the one purporting to prove that Basque is related to Sumerian) and you’ll find scads of them. Means nothing. The only way to establish etymological connection is consistent patterns of parallel developments; the only way to establish borrowing is to show the giver language exercising historically attested influence on the borrowing language at the right time. Two languages with no attested connections at all, are meaningless. (Or, what Paul said.)
    talos: Well, there’s the swampland of glottochronology; there may be something more directly relating to your question, but it’s not occurring to me at the moment. But normally one has more than isolated sets of words to work with, so you don’t need to rely on pure statistics.

  12. “Established” is a wee bit too strong, but “conjectured” is too weak. As I understand, early on Turkish speech was found well up in NE China / Siberia where Mongol / Tungus languages later became dominant. As far as I know it’s also well-established that the Inuit crossed the Bering strait very late, had relatives on the Asian side, and I think also that they could cross the strait. The link between the Inuit word and the widespread, very-well-attested Turkish word has not, to my knowledge, been firmly established, but it involves finding the same (phonetic) word with the same meaning in two languages which were not very far separated geographically.
    Source: Boyle, John Andrew, The Mongol World Empire 1206—1371, Variorum, 1977: includes an essay on Turkish names of watercraft.

  13. Turkish speech was found well up in NE China / Siberia

