I don’t know enough to even begin to evaluate Mena B. Lafkioui’s Rif Berber: From Senhaja to Iznasen. A qualitative and quantitative approach to classification (Dialectologia et Geolinguistica 28 [2020]: 117-156), but there are those who do (hi, Lameen!), and it looks very interesting, so I’m posting it. Here’s the Abstract:
By combining qualitative (synchronic and diachronic) and quantitative (algorithmic) approaches, this study examines the nature, structure, and dynamics of the linguistic variation attested in Berber of the Rif area (North, Northwest, and Northeast Morocco). Based on a cross-level corpus of data obtained from the Atlas linguistique des varieties berbères du Rif (Lafkioui 2007) and from numerous linguistic, sociolinguistic, and ethnographic fieldwork investigations in the area since 1992, this study shows that these Berber varieties form a language continuum with the following five stable core aggregates, which cut across administrative and political borders: Western Rif Berber, West-Central Rif Berber, Central Rif Berber, East-Central Rif Berber, and Eastern Rif Berber. Furthermore, data mining studies made it possible to objectively identify the principal aggregate discriminators of the Rif Berber continuum, which are dealt with in the study. A special focus in the article is put on the interplay be-tween system-internal and system-external parameters for the selection, diffusion, and transformation of variants in Rif Berber.
(Hat tip to John Emerson for the link.)
Hi!
Useful context for this one is Kossmann 2017, La place du parler des Senhaja de Sraïr dans la dialectologie berbère. There is some debate on the place within Berber of the so-called “Senhaja de Sraïr” dialects spoken around Ketama: are they dialects of Rif Berber (Tarifiyt), as Lafkioui argues using purely synchronic data mining methods, or the partly assimilated remnants of a separate Berber variety more closely related to Tamazight and Tashelhiyt further southwest, as Kossmann argues using more traditional historical linguistic methods? (The two positions are of course strictly speaking perfectly compatible – Senhaja’s recent history may well be one of convergence towards Tarifit, rather like that of Tasahlit to Kabyle.) Being mainly interested in the historical development – and generally considering that the notion of Berber being a dialect continuum has been greatly exaggerated in the literature, and we would all benefit from thinking in terms of trees more often – I personally find Kossmann’s argument more helpful in understanding what’s going on with that set of dialects. However, so far data on the varieties in question has been rather sparse, limited largely to a practically oriented colonial-era Spanish dictionary and to Lafkioui’s own questionnaire results (they were even falsely reported as extinct for several years by Ethnologue). Fortunately, Jenia Gutova will (at long last) be defending her thesis on Ketama Berber in a couple of weeks, after which one may hope the situation will be clearer. (Fieldwork around Ketama is uniquely challenging, since the region is reputedly the hub of Morocco’s still-technically-illegal marijuana industry.)
I’m not well versed in quantitative dialectology, but the main thing I wonder about when I read papers of that kind is feature weighting. Obviously not all features should be given equal weight; how much does the choice of relative weight affect the results? And how do we guarantee that the features chosen are representative?
I almost forgot to mention: Lafkioui’s landmark dialect atlas of Rif Berber is also freely and legally downloadable, at Atlas linguistique des variétés berbères du Rif. So any interested person can have a look at the patterns and see for themselves which isoglosses look more significant.
Very cool! What times we live in…
I met a couple of friends of a friend with Moroccan Berber backgrounds during a night out in Singapore when I visited a few years ago, and when they discovered their shared background they started comparing the varieties of Berber they spoke (I might have been partly responsible for steering the conversation this way). It was mostly comparing words and expressions and noting the similarities and differences, conducted in French for the benefit of the rest of us. I don’t remember if they mentioned their exact backgrounds though, so I don’t know just how far apart their respective varieties were.
Not quite – page 15 is immediately followed by page 92, page 239 is immediately followed by page 274, and I expect there are more such Google-Books-like phenomena.
brb, flying to Singapore lah
Gute Reise, bon retour!
Win na ta’asif!
Also interesting:
In French except for what I quoted.
page 15 is immediately followed by page 92, page 239 is immediately followed by page 274
try HAL
I am an old friend of Mr. Languagehat’s. Most of what you folks talk about regarding languages is way beyond me, but I happen to know a lot about statistics, data mining, cluster analysis, multidimensional scaling, and so forth. I skimmed the pdf for things that made sense to me. I did not find a description of the methods used for data analysis. Perhaps that is in one of the other 37 pdfs? But I can make some comments based on the names of things. Of course, names and their meanings change.
K-means clustering is useful for summarizing data. It is usually not appropriate for scientific analyses.
Classical MDS is 1950s technology. Much better algorithms have been available for 50 years. But perhaps it is adequate for these data.
Levenshtein distance is a crude measure. But perhaps it is adequate for these data.
Lameen asked, “Obviously not all features should be given equal weight; how much does the choice of relative weight affect the results?”
Good question. Answer: Potentially enormously.
Warren: Thank you for providing a different perspective on this. Is there any resource you’d recommend for learning more about the strengths and limitations of methods like these?
“The other 37 pdfs” are academia.edu’s attempt to get you to pay them, to have their algorithm find “similar papers” in their collection. Their algorithm is terrible, and chances are many of those 37 papers have nothing to do with historical linguistics or with statistics or with Berber.
Obviously not all features should be given equal weight; how much does the choice of relative weight affect the results?
From what I can tell she does separately run the experiment in several separate domains