Phonetic Word Search.

Janis Krumins writes with a description of a tool he’s created:

It converts IPA symbols to English word(s). Familiarity with the International Phonetic Alphabet is required, but in return, it offers far more flexibility than a regular rhyming dictionary. Using wildcard symbols (any, consonant, vowel), you can find a wide variety of similar-sounding words – rhymes, consonances, assonances, alliterations, pararhymes, and more.

The Phonetic Word Search is here; enjoy!

Comments

  1. It’s got bugs.

    I tried searching for words containing “θ aɪ” and got back “atheism” and “thyme” among other erroneous matches.

  2. It looks like it’s supposed to represent standard AmE, so /hɒt/ for ‘hot’ isn’t right.

  3. It credits three open-source dictionaries: cmudict, Moby Pronunciator, and WordNet. The first two at least have had little development in the last 20 years.

    No distinction between stressed /ɜːr/ and unstressed /ər/ ?

  4. Craig, thank you for your comment. This tool uses CMU Pronunciation Dictionary:

    http://www.speech.cs.cmu.edu/cgi-bin/cmudict

    It is an American English dictionary, and it uses ARPAbet’s phonetic symbols. The word “atheism” transcribes there to “AH TH AY S AH M”. Since ARPAbet’s “AY” equals IPA’s “aɪ” the match you provided is correct in THIS pronunciation database.

    Since there are many dialects of English the transcription of words to phonetic symbols can be ambiguous. Unfortunately, besides the “Moby Project” which is somewhat outdated and inconsistent, the aforementioned dictionary is the only open-source pronunciation dictionary that I know of – it is used by most (if not all) online rhyming dictionaries.

    I hope that despite these limitations this tool can be of use, and if you have any further suggestions for improvement, please, let me know.

    Thanks.

  5. Jen in Edinburgh says:

    Oddly, it does seem to know how to pronounce both theism and atheist.

  6. David Marjanović says:

    It is an American English dictionary, and it uses ARPAbet’s phonetic symbols. The word “atheism” transcribes there to “AH TH AY S AH M”. Since ARPAbet’s “AY” equals IPA’s “aɪ” the match you provided is correct in THIS pronunciation database.

    …but [ɑθaɪ̯sɑm] is neither recognizable as atheism nor recognizable as English. If I heard it, I’d be like “what is this, Burmese?”

    How about [ˈeɪ̯θiɪz(ə)m]? (Note the stress mark preceding the stressed syllable. The CMU dictionary seems to pretend that stress is predictable in English, like in French. It’s not.)

  7. I concur with David. No American dialect pronounces “atheism” as “ʌθaɪsʌm”. Nor is “thyme” anything other than /taɪm/

    Those data are garbage, and I suspect that whole dataset is suspicious.

  8. Jen in Edinburgh says:

    M-W gives thyme with th- as a variant, although the OED doesn’t – I wonder if you would *also* get it if you searched for initial t-.

    But atheism looks like we’ve caught out someone who has only read the word.

  9. @Jen in Edinburgh, “thyme” doesn’t appear in this set when you search with initial t-.

  10. Here’s Mark Liberman on cmudict in 2009:

    One trouble with the CMU dictionary is that its transcription practices are not at all consistent, as you’d expect for a resource put together by many people, from many sources, over a period of time. That’s irrelevant for many purposes, but the inconsistencies are going to make it challenging as a source for this kind of investigation [finding rhymes]. (CELEX is by no means free of such inconsistencies, but they’re an order of magnitude worse in cmudict.)

    Since then cmudict has gone from version 0.7a to 0.7b; I doubt that represents much of an improvement. The link to CELEX2 needs updating to here, but it still isn’t free to access, much less open-source. You get what you pay for.

  11. John Cowan says:

    CAAPR is an excellent pronouncing dictionary, based on the EPD for its BrE pronunciations, but also includes high-quality AmE pronunciations ultimately from M-W, AHD, and RHD. (Note: the linked dictionary files are ISO 8859-1 plain text, so they may show up incorrectly in your browser.)

  12. David Marjanović says:

    But atheism looks like we’ve caught out someone who has only read the word.

    Ah, now I get it. Looking it up, I see that AH is indeed supposed to be /ʌ/ as in hut; we’re looking at the output of an American who equates stressed [ʌ] with unstressed [ə] and who managed to segment the word into three syllables as a-theis-m.

  13. An overworked compiler in a state of brainfart?

Speak Your Mind

*