Building a Database for Fongbe.

We talked recently about the Fon language, known as Fongbe, and when I ran across the Knowledge 4 All project Building a database for Fongbe language in Africa, I thought it was interesting enough to post:

This dataset is part of a 3-4 month Fellowship Program within the AI4D – African Language Program, which was conceptualized as part of a roadmap to work towards better integration of African languages on digital platforms, in aid of lowering the barrier of entry for African participation in the digital economy. This particular dataset is being developed through a process covering a variety of languages and NLP tasks, in particular Machine Translation of Fongbe. […]

The standardized Fongbe language is part of the Fongbe cluster of languages inside the Eastern Gbe languages. In that cluster, there are other languages like Goun, Maxi, Weme, Kpase which share a lot of vocabulary with the Fongbe language. Standard Fongbe is the primary target of language planning efforts in Benin, although separate efforts exist for Goun, Gen, and other languages of the country. To date, there are about 53 different dialects of the Fon language spoken throughout Benin.

Fongbe holds a special place in the socio economic scene in Benin. It’s the most used language in markets, health care centers, social gatherings, churches, banks, etc.. Most of the ads and some programs on National Television are in Fongbe. French used to be the only language of education in Benin, but in the second decade of the twenty first century, the government is experimenting with teaching some subjects in Benin schools in the country’s local languages, among them Fongbe.

I hope it turns out to be useful (note to DE: it’s not done by those people you can’t stand).


  1. David Eddyshaw says

    ‘Tsall good. Though what’s with all this “AI” stuff again? These are not the skills you need for language planning and promotion.

    Ewe is part of the same dialect continuum, but apparently not mutually comprehensible with the languages/dialects at the eastern end.

    The picture is weird: blue shading for all of Nigeria, Benin and Togo.
    If you’re going to be that maximalist, they should have gone ahead and shaded all of Ghana too … then Poland … then Russia …

  2. J.W. Brewer says

    Do any of the Fongbephones support a political faction seeking to change the name of the political entity back to “Dahomey,” to show the Communists who changed it to “Benin” in the Seventies what’s what? The pre-colonial Kingdom of Dahomey was historically dominated by Fongbephones; the ditto Kingdom of Benin (located I believe entirely within what is now Nigerian territory) not.

  3. David Eddyshaw says

    None of it has quite the chutzpah of Nkrumah deciding to call the Gold Coast “Ghana.” It’s on a level with calling Wales “the Roman Empire.” More so: after all, what’s now Wales actually was once at least a part of the Roman Empire, whereas …

    I’ve also just noticed that the designers of the website think that Gbe is Bantu. Perhaps that’s what their “AI” told them.

    I’m beginning to suspect that this is the same sort of boondoggle as the Mali-based thing we discussed not long ago.

  4. I was afraid AI4D might mean “four dimensional AI”. Apparently, it means “Artificial Intelligence for Development” where “development” is economics rather than computing.

  5. Where does this dataset come from, exactly? Is there some governmental entity that produces bilingual Fongbe / French?

  6. ” the ditto Kingdom of Benin (located I believe entirely within what is now Nigerian territory) not.”

    But why did they rename it?
    My favourite renaming is that of Iran:

    “Persia” is associated with poverty and backwardness, and Iran is ancient romantic name (also Aryas are popular in Europe).
    Now Iran is “the nuclear deal” etc. and as I mentioned before I – to my surpise – observed hostility of some people (I mean Westerners, not people from middle eastern countries hostile to Iran*) to Iranians themselves rather than just their country. And Persia is classic poetry:)

    “Pakistan” is also a failure in this respect, but this one is obvious. They disposed of everything associated with “India” (lots of cool things) and are yet another stan.

    * I reiterate that because apparently portraying a country as a Mordor may have some unpleasant consequences. The main problem with foreign opposition to crazy regimes is different though: namely how to oppose them in such a way that you encourage rather than discourage change. Sometimes I wonder if anyone (outside of the coutnry) really want the regimes to change, or conversely find it convenient.

  7. says
    “There are over 525 native languages spoken in Nigeria.[1][2][3] ”

    and lists 389 of them. I don’t see Fon. Their link [3] is Blench, “Atlas of Nigerian Languages” and I also can’t find anythign called “Fon” (there is a “Gbe cluster” though).

  8. PlasticPaddy says

    In that Wikipedia article, the table is inconsistent with the map, which shows a Fon swathe extending across the Nigerian border with Benin. Anyway the table is stated to be “non-exhaustive”.

  9. Yes, but I wonder if it (or a dialect they count as Fon) is in the table, just named differently.

    (Also, numbers like 525 mean, I think that some partly mutually intelligible varieties are counted as distinct. Which makes one hope that something less intellible for speakers of listed languages will also be included…)

  10. PlasticPaddy says
    This is more detailed and contains gun-gbe in the right area.
    Fon seems to be used as a group name for closely related languages including Gun.

  11. PP, thanks. Indeed WP has
    108 Egun Gùn Lagos, Ogun
    I suppose the stage name (Ogun) has something to do with Gun.

  12. I also have a better proposal than “blue shading” for Poland and Russia.

    Just mark continents, not political entities like countries. Paint the whole of Africa.

    (Also I think, it would be right to rename Tunisia as Africa and move the capital to its Fatimid capital, Africa (the European name. Arabs know it as Mahdia)).

  13. ktschwarz says

    the Fon language, known as Fongbe

    TIL, from Wikipedia on Gbe languages:

    Since the establishment of a working group at the West African Languages Congress at Cotonou in 1980, H. B. Capo’s name suggestion has been generally accepted: ‘Gbe’, which is the word for ‘language/dialect’ in each of the languages.[5]
    [5] In daily use, individual Gbe languages are referred to by their speakers as X-gbe, e.g. Ewegbe for Ewe, Fongbe for Fon, etc.

    Fon is one of the languages added to Google Translate last week; Ewe has been there since 2022. I wonder if it’s any good, and if it’s better at Fon/French than Fon/English.

    The OED’s earliest quotation for Fon in English is from Richard F. Burton in 1864. Burton, and several other writers of the 19th and early 20th centuries, spelled it “Ffon”—why a double Ff? I couldn’t find any explanation of that. Also, they overlooked an earlier citation that they already have under Dahoman:

    1793 The Dahomans were formerly called Foys, and inhabited a small territory, on the north-east part of their present kingdom.
    A. Dalzel, History of Dahomy i. i. 1

    (even though they do have the spelling “Foy” in the form list and one of the quotations).

  14. Foy

    Misreading of a handwritten “n” with a longish tail?

  15. ktschwarz says

    I don’t think so, just an earlier attempt (I think by French speakers) at spelling it. “Fouin” was another. The n is not pronounced, it just indicates nasalization.

  16. Also Fogbe

  17. Cust:

    There are according to Schlegel five Dialects, whose Fields are of very considerable extent. Many of them are divided politically into small Provinces, which has to a certain extent produced Sub-Dialects, or at least the semblance of them in the lax parlance of casual writers. I. Following the upper course of the River Volta into the Interior, and passing along the boundary of Dahomé, we come to the Mahe, or Makhi Dialect. This is the furthest to the North-East, but its extent into the Interior is unknown to Europeans; as far as we can judge, this is the purest Dialect. II. The second is the Dialect of the Province of Dahomé. III. The third Dialect is spoken in the Weta Province, reaching to the Sea, entered on the Maps as Whidah. This Dialect resembles very much that of Dahomé, and the Province is subject to the King of that Country. Whidah is written Hwidah, Judah, Fidah: the people are called Ffon, Popo or Papaa: the Language is called by one French Missionary, Courdioux, Fogbe, by another, Bouché, it is called Jeji: another term, Ardrah, is used, but it may be presumed that one of the two lastnamed Dialects is intended. IV. The fourth Dialect is that of Anfúe, and is spoken in that part of the Field, which stretches from the South boundary of Dahomé towards the Atlantic, having the Weta Dialect on the East, and the Ashanti Language-Field on the West, divided from it by the River Volta. This tract was divided into several States, which have now been joined to the Kingdom of Peki. The South-West portion of Anfúe is called Krepi on the Map, but the real name is Peki, or Pedsi, in a North-West direction from Keta, the residence of the Mission. From this word Krepi, or Kerrapi, in itself fertile in confusion, has sprung another name, Wegbe. V. The fifth Dialect is the Anlo on the West side of the Slave Coast, bordering on Adampi of the Akrá LanguageField, but having the River Volta betwixt them, with the Atlantic on the South, the Anfúe Field on the North, and the Weta Field on the East. Schlegel has compiled his Grammar in this Dialect, and the nearest to it is the Anfúe. In spite of all the risk of admixture of foreign elements, the Anlo is the purest of the Dialects.

    Many Vocabularies have been supplied: the one in Koelle’s Polyglotta Africána of Adampi, a Dialect of Akrá, is in fact the Anlo Dialect of Ewé. Courdioux, a French Missionary, supplies one of Dahomé in the Journal of the French Philological Society. F. Müller gives a Grammatical Note of this Language, together with that of Ashánti, Akrá and Yariba in his Outline of Philology. …

    Ardrah:/ Pseudo-Berber.

  18. Did anyone spell it Ffon before Burton in the 1860s? Dalzel (1793) used the French “Fouin, or Foy.” Forbes (1851) “Dahomian language.” It seems to have been Fon in German, at least, around 1850.

    Burton criticized Zimmermann (1858) for calling it “Ewe,” I guess because the geography was wrong? Oddly enough, Ewe does distinguish two labial fricatives, f and ƒ. Could Burton have been trying to emphasize that Fongbe has one particular one?

  19. I mean, I once complained about R’Kiz and Mederdra in Trarza (which is bordering Adrar).

    Ardrah sounds much like these.

  20. ktschwarz says

    Could Burton have been trying to emphasize that Fongbe has one particular one?

    In that case why would he spell words of the language with a single f, like Afa ‘divination’? He doesn’t give an explanation.

    Anyway, I like this footnote:

    The queer chirp of these modern pterodactyles [i.e., bats], and the melodious gazouillement† of birds in the brake, awoke us at the earliest dawn.

    † This is a French word, but I cannot help it—let reviewers say what they will. The sound of z in the song of West African birds is salient; our insipid “warbling” is tolerable and not to be endured. I distinctly deny that English or any other language contains all the desirable shades of expression; and I cannot see why, in these days, when French is familiar to us as in the times of William the Conqueror, we should be condemned for borrowing from it. “Rot your Italianos; I loves a simple English ballad,” appears to underlie the feeling.

  21. Indeed. It seems so obviously deliberate, yet there is no explanation. Even if it were borrowed from someone earlier, you could expect a short reference. It’s as though he expected the reader to recognize what was intended. But I can’t think of anything that ff vs. f indicates other than voicing, for which v serves most everywhere.

    He does call out what would seem notable features to European speakers. That the language is tonal. “Like the Chinese it depends greatly upon accent and the stranger’s ear has hard work.”. And kp and gb, “at first inaudible to our ears, and difficult to articulate without long praotice.”

    Does Burton give any Ffon words starting with just f?

  22. David Eddyshaw says

    I find kp and gb pretty easy, at least compared with Hausa glottalised consonants and implosives, let alone clicks. I suppose it’s all a matter of what you’re used to, though. It seems that Burton was complaining about gb specifically, rather than kp, and I do myself find gb trickier, for reasons I can’t really pin down. I don’t think it’s just me, even: there are a fair number of actual West African languages that only have the voiceless labial-velar stop. I don’t think the actual articulation of kp and gb is always parallel: there are certainly several phonetically distinct versions of kp, too.

    (In Kusaal, apart from tone, I think the most challenging sounds are the glottal vowels. Like clicks – or tones, come to that – it’s not so much that they’re hard to produce: it’s producing them in the right contrastive contexts.)

    [Up late for the election results. Trying not to gloat … nasty Reform buggers doing better than would happen in a just world, though.]

  23. When I hear a k͡p, it often sounds to me like pk. That way I know that if I ever need to produce it, make the two articulations really simultaneous, never mind the orthography.

    (And congratulations on the elections, condolences on having to see NF’s face again.)

  24. David Eddyshaw says

    Waama actually has a synchronic rule *pk -> k͡p, as in kɔukpa “hundred”, corresponding to Kusaal kɔbiga (the devoicing bit is an Eastern Oti-Volta thing.)

    [I suspect that Farrago will bugger off to his fascist soulmates in the US unless his scheme to assimilate the Tory remnants pays off. One can confidently assert that he has no interest whatsoever in actually representing the deluded xenophobic voters of Clacton in Parliament in any way at all.]

  25. [The fascist soulmates in the US will never accept someone named Nigel as quite equal. They’ll be happy to give him a spot on TV, though.]

  26. [All in all, much of the electoral success of Labour came from Nu-kip eating a big chunk of the Tory vote in conservative districts. The increase in the national vote for the non-vile parties is modest. But—that’s how politics is.]

  27. In regard to the interesting topic of Burton’s possible use of ff for [f] and f for [ɸ]…

    I was curious and looked into the historical phonology of this famous Ewe contrast. It was interesting to learned that the Fon group of Gbe varieties does not have the bilabials [ɸ] and [β], only [f] and [v]. According to Hounkpati B.C. Capo (1991) A Comparative Phonology of Gbe, the bilabials are characteristic of the Vhe (Ewe) group only, where they evolved from Proto-Gbe *χʷ and *ʁʷ. The [f] and [v] of both Fon and Vhe are apparently inherited from Proto-Gbe. I hope LH readers can see the comparative tables 103 and 104 on page 110 of Capo’s work here. (For example, Proto-Gbe *-χʷe ‘festival’ > Vhe eƒe (with ƒ [ɸ]), Fon -χʷe.)

    (I gather the è- in Eʋe, and in Effon as in A.B. Ellis (1890:229 and elsewhere) is a lexically determined nominal prefix that can be deleted if compounding, phrasal morphosyntax, etc., otherwise allow the satisfaction of a disyllabic minimal word condition.)

  28. I do myself find gb trickier, for reasons I can’t really pin down.

    ducttape (?ductape ?ducktape). dogbiscuit. Dogberry.

    Māori has syllable-initial ng-, which is quite tricky for English speakers Ngauruhoe would be more talked about if not for that.

    [And congratulations on the election result. Yeah where will all that outrage go when Farrago finds he’s only 4 amongst 650?]

  29. ducttape (?ductape ?ducktape). dogbiscuit. Dogberry.

    You do realize, I trust, that “gb” does not represent a sequence of g and b, as in English.

  30. Another interesting thing in this area is that Fongbe doesn’t really have /p/. Except in loanwords: pádrì ‘priest’ (from Portuguese), pápà ‘pope’ (from Portuguese or French), pápá ‘papa’, pɛ́ɛ̀n ‘bread’ (from French, alongside blɛ̌ɖì). But some earlier(?) ones chose differently: pósù ‘post’ (from French) is also kpósù. kpánì ‘pan’ (from English), kpɔ́nwùn a 25 francs CFA coin (from English ‘pound’). Or cɔ́fù ‘shop’.

    This is only snippets, unfortunately.

  31. David Eddyshaw says

    Yoruba is the same (written p represents /k͡p/.) And Hausa, of course. And Manding …

    Nawdm has no /p/ either (it has shifted proto-Oti-Volta *p to /f/, as in fɔ́gá “woman” beside Mooré pága, Gulmancema púa etc. Its close relative Yom vacillates, showing /p/ in some words, /f/ in others, according to no regular principle that I have been able to discover. Closer to the Gbe area, Kabiye does have /p/, but it’s always the result of devoicing of *b (there is no /b/); proto-Grusi *p has become /h/, as in háʋ “give” beside e.g. Kassem pa.

  32. PlasticPaddy says

    Kpɔ́ùn : Pound (anglais)

    Kpɔ́ùnatòn : Soixante-quinze [atòn = 3]

    Kpɔ́ùnɖokpó : Vingt-cinq francs (une livre anglaise (“pound”)) [ɖokpó = 1]

    Kpɔ́ùnwè : Cinquante francs cfa (2 x 25)

    Could the Kpɔ́nwùn from that dictionary be a circomlocution (“the equivalent of a pound”?–NB wùn = prendre une quantité and tàwùn = précisement)?

  33. “gb” does not represent a sequence of g and b, as in English

    Yes I do realise. That’s why I talked about ng- in Māori, which does not represent a sequence -ng, as in English ‘singing’. My example had ng- word-initial, but even in say Whanganui that’s hwa-nga-nui.

  34. Yeah, but your dogbiscuit examples are not to the point.

  35. David Eddyshaw says

    Kpɔ́ùn : Pound (anglais)

    (Hausa fam.)

    It’s all a bit reminiscent of Irish borrowing Brythonic /p/ as /kʷ/.

    I have my suspicions that proto-Oti-Volta *f goes back to an earlier *k͡p.
    It seems to have had a weird distribution, being found in clitics and affixes but not in full-word stems (quite unlike *v, which is reconstructable way back, even before POV.)

    I was just looking at a Mbembe grammar (by the excellently-named Doris Richter genannt Kemmermann*) which seems to have initial /f/ in at least some words that have initial *k͡p in Oti-Volta, like “die”, cf Kusaal kpi (and proto-Bantu *kú-.) Unfortunately I don’t know anything much about proto-Jukunoid …

    * I gather that German names on this pattern are an actual Thing, though I don’t recall encountering any before.

  36. ” *p has become /h/, ”

    In normal languages /t/ does this:-/ (I wonder if p>h is just lenition or something else)

  37. David Eddyshaw says

    “Normal” nothing! It happened in Celtic. Languages don’t get any normaller than that. (Armenian, too.)

    Lots of languages in that part of West Africa (and beyond) seem to be allergic to /p/, as MMcM pointed out. And f/h alternations are pretty familiar territory (Japanese, Hausa …)

    The only cross-linguistically unusual thing here is the using of /k͡p/ for /p/ in loanwords, but then that is probably only unusual because /k͡p/ itself is so uncommon outside West Africa (where, in contrast, it’s a boringly usual part of the phonemic inventory of practically every other language.)

    Kusaal uses kp to render Hausa /kʷ/ in bakpae “week” (from Hausa bakwai “seven.”) It’s just an everyday sound for Kusaasi … not marked or anything.

  38. “Normal” nothing! It happened in Celtic. Languages don’t get any normaller than that.

    You seem to be in violent agreement with drasvi.

  39. David Eddyshaw says

    Not for the first time …

  40. David Marjanović says

    I gather that German names on this pattern are an actual Thing, though I don’t recall encountering any before.

    Theo Vennemann genannt Nierfeld.

    Eventually he stopped putting that on his papers because, I suppose, nobody actually calls him Nierfeld.

    Maybe this is the northern version of the southern phenomenon of (reportedly) calling people by the name of the original owners of their farm. I don’t know.

  41. David Eddyshaw says

    Talking of unexpected surnames, I recently had cause to consult some publications by

    daughter of É. Kiss Sándor. Clearly the É. is an integral part of the surname, and it doesn’t actually seem to stand for anything.

    I am entirely confident that some Hatter will be able to explain this.

  42. LH, DE, well, I must agree that Celtic languages are the norm*.

    I can of course say that I was exposed to English before I decided to learn Irish and Breton, but then I was “exposed” to all sorts of languages well before English…. It was very basic.

    *especially when we’re discussing lenition*, if p>h is about “weakening” and not just replacing a phoneme with an equally strong phoneme.

    **FUCK!!!!!!!!!!! ONLY now I realised that lenition is from ЛЕНЬ!!!!! “lazyness”
    They were fooling me, dressing up something SOOO simple as “Latin”. Fuck.

  43. Кельтские языки обленились. Некоторые совсем обленились.

  44. David Eddyshaw says

    I wonder if the É. in É. Kiss is anything like the (spurious) V affected by this unpleasant fellow?

    Some Hatter will know.

  45. From Kálmán Béla (1978) The World of Names: A Study in Hungarian Onomatology, p. 78 :

    Double surnames appeared quite early. As early as the 16th century we find names (they are given in their natural, Hungarian order) like Tót Gál Benedek, Kalmár Szabó János, Kis Fodor István, Nagy András Ferenc. In later times, especially in the territory east of the river Tisza, these surnames were very common: Balás-Szabó, Jankó-Nagy, Hajdú-Sípos, Kecskés-Pap, Könyves-Tóth, Debreceni-Kis, Vályi-Nagy, Csokonai-Vitéz, Tápai-Szabó, Otrokocsi-Nagy, Váradi-Szabó, etc.

    Already in the 18th century it occurred sporadically in the registers, that one of the double surnames (perhaps a nickname) was marked only by an initial. In the last century this practice became even more wide-spread. Very often the particular family may not even have known any longer what the actual name marked by the initial may have been. Names of this kind are B. Nagy Mihály, É. Kiss Sándor, К. Kovács László. Less commonly, the initial is in a middle position, as in Szabó T. Attila.


  46. PlasticPaddy says

    Are you sure this fully explains the particular case?
    Hungarian Wikipedia has
    É. Kiss Sándor Kis József és Salánki Róza gyermekeként,
    So the father at least used a different spelling and no prefix, unless the prefix disappears in contexts like the above.

  47. David Eddyshaw says

    Thanks, Xerîb!

    So that would be like our own (now defenestrated) synthetic person R. Mogg Jacob, or (to turn to a real politician) the excellent L. George David (who knew my father.)

    I’m feeling a bit outclassed by these names. I shall henceforward go by the name Dafydd ab Iago genannt Eddyshaw. (This should get my name cited first in publications, too, apart from a few Aardmans and the like.)

  48. Theo Vennemann genannt Nierfeld.
    I didn’t know that about Vennemmann

    Maybe this is the northern version of the southern phenomenon of (reportedly) calling people by the name of the original owners of their farm. I don’t know
    Despite growing up in the North, this discussion is the first time I have encountered this construction as a surname; it’s unremarkable as a description (“Al Capone, genannt Scarface”).

  49. David Marjanović says

    Vennemann’s page; “gen. Nierfeld” is legible in the thumbnails of the two publications right on top, and it’s spelled out in some others.

Speak Your Mind