David Robson writes for BBC Future (!) about a new analysis of nearly a billion tweets that makes a nice follow-up to yesterday’s post; this one is based on SCIENCE:
The researcher behind the study, Jack Grieve at the University of Birmingham, UK, analysed more than 980 million Tweets in total – consisting of 8.9 billion words – posted between October 2013 and November 2014, and spanning 3,075 of the 3,108 US counties. […] The result was a list of 54 terms […]
Having compiled this new lexicon, Grieve next used Twitter’s geocoded data to track its origins and spread across the USA. Baeless [‘single’], for instance, appeared to crop up in a few different counties across the south, before building in popularity and then spreading north and west.
In total, Grieve identified five hubs driving linguistic change. In order of importance, they were:
The West Coast
[…]
Notable terms: amirite (Am I right?); baeritto (a lover you’d like to wrap your arms around like a burrito); figgity (intoxicated/very); slayin (looking great) and waifu (wife).The Deep South
[…]
Notable terms: baeless (single); boolin’ (chilling), famo (family and friend); traphouse (drug house).North East
[…]
Notable terms: balayage (a hairstyle); litt or litty (good); lituation (a ‘litt’ situation)Mid-Atlantic
[…]
Notable terms: on fleek (on point/flawlessly styled); shordy (short); wce (woman crush everyday)Gulf Coast
[…]
Notable terms: bruuh (bro’): idgt (I don’t get tired); lordt (Oh Lord!)Grieve says he was surprised by the results. According to linguistic theory, you would expect new words to arise in areas with the highest population density – but this could not explain all the variation. Grieve’s data confirms that the cultural (and linguistic) importance of a region is only loosely connected to its actual size.
More details and analysis at the link, of course. Thanks, Trevor!
After all the still-smoldering fuss over companies mining Facebook for data, it’s surpising that the article says not a word about the propriety of getting one’s hands on 980 million tuits (Spanish).
# each Tweet is timestamped and geocoded, offering precise information on the time and place that particular terms entered conversations. #
Only time and geolocation ?
Tweets are public by default.
waifu (wife)
I don’t think I’ve ever seen waifu used in the literal meaning “wife”, but the metaphorical meaning (summarizing from Wiktionary, “a fictional female character to which one is attracted”) feels old enough to predate Twitter.
(The oldest citation on Wiktionary is from 2008, but their main source for online terms is Usenet, which had become somewhat unpopular by the late 2000s, so they can easily miss early attestations. Know Your Meme dates it to 2006, and attributes the origin to Azumanga Daioh.)
Incidentally, I don’t think amirite is as young as Twitter either. [Sure enough, it goes back to at least 2005, and probably earlier.]
(I am entirely unfamiliar with the rest of the listed terms, however.)
Tweets are public by default.
Having just taken another cursory look at the privacy policies of Twitter and Facebook, I find them to be pretty much the same. I don’t want to deflect this comment thread, though, so I’ll just end by remarking that, currently, either I don’t understand the fuss about Facebook and Cambridge Analytica, or I can’t understand why there has not been a Twitter fuss. Maybe both !
Yes, you can make things private, but if you don’t, they are as public and available for quoting and other use as if you had published them in the newspaper. This is not some nefarious ploy, it’s what “publishing” means. Many people don’t seem to understand what they’re doing when they publish online, but that’s on them.
Since deflecting, and indeed derailing, threads is pretty much a tradition around here, I will geeksplain to the best of my ability.
FB maintains a lot of information on its users, and will provide it to third-party app developers if the users give their consent to its use by that app. It also provides information on the (unconsenting) users reachable via FB’s social graph from the consenting users, but solely for the purpose of enhancing the consenting users’ experience of the application. (At least, all this was true in 2015; by now it may have changed, and presumably under the GPDR it must now change.)
CA, on the other hand, took the data from both consenting and unconsenting users and used it for its own political purposes. This was a breach by CA of FB’s terms of service, but the general view is that FB was excessively trusting of the pure motives of its app developers, and took far too long to act after the misuse was brought to light, so they are by no means without spot and void of culpability.
Twitter, on the other hand, maintains relatively little non-public information about its users as far as is known, and releases it only in partly aggregated form: for example, they may use IP addresses for (imprecise) geolocation, but release only the geocodes, not the IP addresses. As already noted, tweets themselves are public information, and posting them is implicit consent to their publication.
Thanks JC, to continue in that fine tradition …
Google mail this month has “Improvements to our Privacy Policy and Privacy Controls”. The verbiage is voluminous. The “improvement” seems to be chiefly to Google’s benefit in ability to harvest data. Are they pulling a fast one?
Facebook has moved its data out of reach from the new European Privacy Law (GDPR — I presume that’s what you meant) https://www.theguardian.com/technology/2018/apr/19/facebook-moves-15bn-users-out-of-reach-of-new-european-privacy-law. That’s hardly the actions of an organisation trying to clean up its act. So, no they’re not bound to change anything.
I am so far ahead of the #deleteFacebook curve that I never Facebooked in the first place.
The perils of connecting-only-long-words-title-hyphens-skipping-punctuation style URLs — how can there be 15 billion FB users outside the EU I thought, turned out it was 1.5bn. (Actually the Guardian seems to keep the function words, unlike lots of other places. More power to them).
What are “function words” ?
They are like fungus, popping up in the cracks between the words that you actually want to have in your sentence. Or so I was told.
Ah, FUNGION words ! How easy it is to be mis-led by a mere typo.
Am I the only one bothered by the analysis describing Memphis as in the “Gulf Coast” region? Now, if the data shows that New Orleans and Memphis cluster together for purposes of certain linguistic innovations but neither clusters closely with Atlanta, that’s an interesting and worthwhile observation, but less confusing names than “Gulf Coast” for the NO-Memphis region and “Deep South” for the Atlanta-but-maybe-not-too-far-west-of-Atlanta region would be helpful.