Some useful sites: Unicode character table (great layout), shapecatcher (draw your own characters), amp-what (type a description). Via MetaFilter (where people will doubtless post other links that are useful and/or fun).
Some useful sites: Unicode character table (great layout), shapecatcher (draw your own characters), amp-what (type a description). Via MetaFilter (where people will doubtless post other links that are useful and/or fun).
Commented-On Language Hat Posts (courtesy of J.C.; contains useful Random Link feature)
E-mail:
languagehat AT gmail DOT com
My name is Steve Dodson; I’m a retired copyeditor currently living in western Massachusetts after many years in New York City.
If your preferred feed is Twitter, you can follow @languagehat to get
links to new posts here as they appear. (I don’t otherwise participate
in Twitter.)
If you’re feeling generous:
my Amazon wish list
And you can support my book habit without even spending money on me by following my Amazon links to do your shopping (if, of course, you like shopping on Amazon); As an Amazon Associate I earn from qualifying purchases (I get a small percentage of every dollar spent while someone is following my referral links), and every month I get a gift certificate that allows me to buy a few books (or, if someone has bought a big-ticket item, even more). You will not only get your purchases, you will get my blessings and a karmic boost!
If your comment goes into moderation (which can happen if it has too many links or if the software just takes it into its head to be suspicious), I will usually set it free reasonably quickly… unless it happens during the night, say between 10 PM and 8 AM Eastern Time (US), in which case you’ll have to wait. And occasionally the software will decide a comment is spam and it won’t even go into moderation; if a comment disappears on you, send me an e-mail and I’ll try to rescue it. You have my apologies in advance. Also, my posts should be taken as conversation-starters; there is no expectation of “staying on topic,”and some of the best threads have gone in entirely unexpected directions. I have strong opinions and sometimes express myself more sharply than an ideal interlocutor might, but I try to avoid personal attacks, and I hope you will do the same.
Songdog
Kaleidoscope
The Daily Growler †
wood s lot †
MetaFilter
an eudæmonist
Avva (Russian)
No-sword
The Cassandra Pages
Transblawg
Epigrues
Far Outliers
paperpools
Lizok’s Bookshelf
A Bad Guide †
Poemas del río Wang
The Flaxen Wave
ТЕТРАДКИ: Что о нас думают в Европе?
Russian Dinosaur
XIX век
Wuthering Expectations
Boris Dralyuk
Laudator Temporis Acti
The Untranslated
The Fate of Books
The Millions
Linguablogs:
Language Log
Anggarrgoon
Jabal al-Lughat
Dick & Garlick
bulbulovo
Ἡλληνιστεύκοντος (in English)
Word Routes
Sentence first
Balashon
Separated by a common language
Ozwords (a blog from the Australian National Dictionary Centre)
The *Bʰlog (“A blog devoted to all matters Indo-European”)
Strong Language (“a sweary blog about swearing”)
Language resources:
Arnold Zwicky’s list of blogs and resources
Multitran
American Heritage Dictionary
Green’s Dictionary of Slang
Wiktionary
bab.la
TypeIt (IPA keyboards, language character sets)
Clickable IPA chart (by Weston Ruter)
Wordorigins
Wikipedia:Reference desk/Language
The sci.lang FAQ
Omniglot
ScriptSource
BibleOnline
Jewish Lexicon Project
Cambridge International Dictionary of Idioms
TITUS: Thesaurus Indogermanischer Text- und Sprachmaterialien
American Heritage Dictionary Indo-European Roots Appendix
Andras Rajki’s Etymological Dictionary of Arabic
Germanic Lexicon Project
Dictionary of the Scots Language
Das Deutsche Wörterbuch von Jacob und Wilhelm Grimm
Wortschatz Deutsch
Etymologisches Wörterbuch des Deutschen
DWDS (Der deutsche Wortschatz von 1600 bis heute)
etymologiebank.nl (Dutch etymology)
Latin Dictionary and Grammar Aid
Thesaurus Linguae Latinae (Open Access PDF version of volumes A – M and O – P)
Trésor de la langue française informatisé
Dictionnaire de l’Académie francaise
Centre National de Ressources Textuelles et Lexicales
Lexilogos (French)
Dictionnaires d’autrefois
Französisches Etymologisches Wörterbuch
Real Academia dictionary (Spanish)
Diccionari català
Vocabolario Etimologico della Lingua Italiana
RAI Dizionario d’Ortografia e di Pronunzia (includes proper names)
Dizy: Il dizionario pratico con curiosità e informazioni utili
Dicționare ale limbii române
electronic Dictionary of the Irish Language
Cadhan Irish Dictionary (bridge to eDIL)
MacBain’s Etymological Dictionary of the Gaelic Language (1911 edition)
Cornish dictionary online
Arak-29 (Armenian links)
Verb Conjugator
World Wide Words
Online Etymology Dictionary
Tower of Babel etymological database
Perseus Digital Library 4.0
Logeion (Greek-to-English and Latin-to-English dictionary search)
Greek language and linguistics
Λεξικό της κοινής νεοελληνικής [Modern Greek Dictionary] (comprehensive; includes etymologies)
LBG (Lexikon zur byzantinischen Gräzität = Lexicon of Byzantine Greek)
Orbis Latinus
Slovopedia (links to Russian dictionaries; sidebar has links to comparable pages for German, Ukrainian, Belorussian, Georgian, and Kazakh)
dic.academic.ru (Russian dictionary search)
ПоискСлов (Russian dictionary search)
Русский этимологический словарь (A.E. Anikin’s new Russian etymological dictionary, now on letter д)
etymolog.ruslang.ru (Russian etymology and word history links)
Philology.ru
Philolog.ru
Slavenitsa (converts from modern Russian to pre-reform orthography)
Minority Languages of Russia
Ru_slang (Russian)
Vasmer’s etymological dictionary(Russian)
Russian language links
Russian literature online
Национальный корпус русского языка (Corpus of the Russian Language)
Большой толково-фразеологический словарь Михельсона (1896-1912)
Словарь русских народных говоров [Russian dialect dictionary]
Старославянский словарь [Old Church Slavic dictionary]
Словарь русского языка XVIII в [Dictionary of 18th-c. Russian]
Russian Word of the Day
Ukrainian etymological dictionary
Речник на личните и фамилни имена у българите (Bulgarian names)
A Dictionary of Tocharian B (with etymologies)
Chinese Character Dictionary
Zhongwen.com
The Kanji Site
Mongolian/English dictionary
Digital Dictionaries of South Asia
Monier-Williams Sanskrit Dictionary
Nişanyan’s Turkish Etymological Dictionary
The Austronesian Comparative Dictionary
An ka taa (resources and lessons for Bambara, Dioula, Malinké, and Mandinka)
Movies listed by language at IMDB
Languages online
Historical Dictionary of Science Fiction
Speculative Grammarian
Word Oddities
Jan Freeman’s Boston Globe column
Character converter
Mailing list
Hattics mailing list
Visual pleasures
Nick Jainschigg’s blog
Citrus Moon
Ramage
Favorite rave review, by Teju Cole:
“Evidence that the internet is not as idiotic as it often looks. This site is called Language Hat and it deals with many issues of a linguistic flavor. It’s a beacon of attentiveness and crisp thinking, and an excellent substitute for the daily news.”
From “commonbeauty”
(Cole’s blog circa 2003)
All comments are copyright their original posters. Only messages signed “languagehat” are property of and attributable to languagehat.com. All other messages and opinions expressed herein are those of the author and do not necessarily state or reflect those of languagehat.com. Languagehat.com does not endorse any potential defamatory opinions of readers, and readers should post opinions regarding third parties at their own risk. Languagehat.com reserves the right to alter or delete any questionable material posted on this site.
Copyright © 2025 · languagehat.com
Richard Ishida is also a good go-to person for Unicode, especially if you want the nitty gritty of how different scripts work with Unicode: http://rishida.net/
This is the most comprehensive unicode site I’ve found:
http://www.fileformat.info/info/unicode/index.htm
And why on earth does unicode not include cyrillic vowels with stress marks ? I know it’s possible to create them using ́ (COMBINING ACUTE ACCENT) but few fonts give an acceptible result.
I guess shapecatcher is a great source for riddles. Here’s one. I gave it a reasonably well drawn character to recognize and received back a long list of possibilities, which did not include the intended character (I’m not sure it is in Unicode at all), but have some nice variants. Partial list:
Latin capital letter l with middle dot: Ŀ
Reverse solidus preceding subset (Unicode hexadecimal: 0x27c8)
Latin small letter k with acute: ḱ
Vai syllable la:(Unicode hexadecimal: 0xa55e)
Canadian syllabics taa: ᑖ
Musical symbol c clef: (Unicode hexadecimal: 0x1d121)
Cyrillic small letter i with grave: ѝ
Greek capital dotted lunate sigma symbol: Ͼ
Hiragana letter ni: に
Try to guess what was the intended symbol.
For those using Emacs, the One True Editor allows you to insert any Unicode character by hitting C-x 8 RET and then typing in the name (with tab-completion speeding that up). It’s a lot faster to do e.g. C-x 8 RET LATIN SMALL LETTER DELTA than to open a character map, click around to find what you want, and copy and paste.
For Windows: BabelStone.
Unicode: The Movie.
Alex: Because precomposed characters are, in general, only provided in Unicode when existing character sets already had them, so that 1-1 round-trip conversions were possible. That was not the case for any existing Cyrillic character set. Exceptions were sometimes made when the letter-with-diacritic is considered a distinct letter of the alphabet and/or the language in question never had computer support before, neither of which is the case for Russian vowel letters marked for stress.
And also for Windows, don’t forget about my Moby Latin and Whacking Latin keyboards, which only handle about 1% of Unicode, but most likely the most important 1% for people using U.S. or UK physical keyboards.
Suppose you work under Windows, and only occasionally want to insert text in non-English characters – as I do, say when commenting here with letters from a European language. Then there is an easy way to do this: use the Windows on-screen keyboard.
I mostly uses a standard character set – the one provided by my physical keyboard – and can switch in various “virtual” keyboards when I need them for text in different languages.
What I get from this is what I wanted to get. I don’t have to struggle with Unicode to get it. That’s the great advantage.
Stu, what does your physical keyboard look like? Is it an American QWERTY, or a German QWERTZ?
John: QWERTZ. After making those bold claims about ease in use, I am now investigating which languages are actually non-problematic, given the way I work. Russian is not one of them, but I could pretend that it is not a “European language” …
I use UltraEdit to create blog comments outside the blog editor. This works fine with English, Spanish and French, where the codepoints I need are in the standard upper-ASCII set. When I want to copy some Russian word from another comment into the text, I must work directly in the blog editor.
In UltraEdit with the RU on-screen keyboard, to type Russian I had to change the charset to ISO8859-3 (Latin-3) and the font to “Arial Unicode MS”. The hex mode shows each letter as a single byte – some kind of upper-ASCII mapping – so clearly I couldn’t copy this text into the blog editor.
So what I claim boils down to this – if you work with languages with ASCII codepoints, you don’t need unicode. Who’da thunk it ?
“ASCII mappings”, not “ASCII codepoints”.
I never understood what that Unicode stuff was for, and how it worked. Having a qwerty keyboard on a laptop gives me a bit of a headache while writing in French. Pressing “Alt 130”, “Alt 147” or “Alt 0156” usually does not improve typing speed, especially when you are left wondering whether the one you are looking for is 140, 141, 150 or 151. So how could this improve matters (given that I can’t install new software on that computer)?
Stu, I’d like to develop a QWERTZ version of my keyboard driver. If you’re interested in beta-testing such a thing (it supports vast quantities of Latin-script letters, lots of symbols, and math-Greek, but not Cyrillic yet), drop me a note at cowan@ccil.org.
UltraEdit supports Unicode. If you set the character set to UTF-8 or UTF-16, you can represent all characters. You can install a Russian keyboard driver (I use Russian Phonetic YaWERT) and then type Russian as well, switching keyboard drivers using the Windows Language Bar.
Siganus: Yeah, if you can’t install a better keyboard driver, you are out of luck.
Sig: Unicode is merely a system in which (binary) numbers are assigned to “glyphs”. A glyph is a letter or symbol in a writing/printing system.
A computer “text file” is a sequence of bytes, i.e. binary numbers, stored on a medium. A display program (such as Word in Windows) reads those numbers from the medium and presents the corresponding sequence of glyphs on your monitor (another medium).
That’s the basic principle. Unicode is a convention for translating back and forth between numerical and visual representations of letters.
The unicode idea is extremely old. You find it in gematria, the Hebrew descendent of assyro-babylonian numerology. According to the German WiPe, gematria is based on the fact that special numeric symbols were a later addition to writing systems using letters. Before the numeric symbols were invented, already existing letters were used to represent numbers.
I mention the German Wipe on Gematrie because the English one says nothing about the prior use of letters to represent numbers as a “hack” due to the absence of special numeric symbols. The English article rushes right into Rabbinic and Kabbalistic hermeneutics. If learned Jews had not been gobbled up by all that silliness, they might have found time to invent Word for Windows before the Baby Jesus burst on the world.
Thanks Stu, but I’m left wondering what practical use that all thing might have if you are not a programmer, i.e. for a layman like me. (Incidentally, I loved the gematria “games” in Potok’s novel The Chosen.)
Sig, when you drive a car and it just stops, rudimentary knowledge of how a car works helps you to identify whether you’ve merely run out of gas, or need to contact a car mechanic.
To know the unicode priniciple should help you to identify certain problems on your computer as unicode/keyboard/font mismatch problems, so that you know to contact a unicode mechanic to fix them.
Stu, if there were funny signs suddenly appearing on my computer screen, like skulls and bones, smileys or ampersands, I would certainly not start to unscrew the back of the damn machine to feed it some unicode from a character table that might or might not put it back on the right track. I would certainly leave it to mechanics and their greasy hands!
Well, even for people with no interest in Unicode as a coded character set, Unicode as a vast repertoire of characters can still be compelling. Go to the code charts and check out the stark angularity of Old South Arabian, the Greekness in disguise of Gothic, the still-mysterious pictographs of the Phaistos Disc, the bald heads of Oriya, the whorls of Saurashtra, the misleading familiarity of Cherokee, the Braille dots, the Yijing (I Ching) hexagrams, the dingbats, the emoticons. I can admire as well as anybody the powerful sweep of mighty generalizations in physics or real analysis that bring skrillions of separate examples under their control, explaining much with little. But my heart is given to the complicated domains of learning, the natural numbers and discrete mathematics generally, natural and constructed languages in all their diversity, writing systems.
John, you are right, the possibilities are mind-boggling. By lifting your eyes to the top of this page maybe you could have also taken Steve’s banner into account: we could also write in cuneiform thanks to Unicode:
http://www.unicode.org/charts/PDF/U12000.pdf
Now how one would physically do that here without a proper calamus remains a mystery to me. Could it simply be by typing U12038 or U1203A here?
No, it’s not. Unicode is even more mysterious than I thought. Maybe it comes from Rapa Nui as well.
U12038 or U1203A
That’s almost right. In fact, we must write 𒀸 or 𒀺 respectively to produce 𒀸 and 𒀺. In that way, the cuneiform characters become part of this very comment. If you try this yourself, don’t forget the semicolon at the end of each.
Now whether these characters actually appear to you or me as cuneiform characters, outlined blocks, little groups of numbers, or “last resort” glyphs depends on what fonts we have installed on our computers, and not at all on what Steve or his blog software does. If you see something other than cuneiform, you can install a proper cuneiform font and then redisplay this page, and the Right Thing will appear. (Some older operating systems may not be able to handle characters with five-digit Unicodes such as these, however.)
Maybe it comes from Rapa Nui as well
“Nay,” said I, “I come not from heaven, but from Essex.”
Arrgh, I made a Balls of it. We must write 𒀸 and 𒀺.
Now this is going too far. We must write (but without spaces) & # x 12038 ; and & # x 1203A ;.
In which case, perhaps the Cuneiform Digital Library Initiative may be of interest.