AI, Language Learning, and Grammar.

Bathrobe sent me this post at The Conversation by Morten H. Christiansen and Pablo Contreras Kallens; after talking about the Chomskyan insistence on a grammar template wired into our brains, they continue:

But new insights into language learning are coming from an unlikely source: artificial intelligence. A new breed of large AI language models can write newspaper articles, poetry and computer code and answer questions truthfully after being exposed to vast amounts of language input. And even more astonishingly, they all do it without the help of grammar.

Even if their choice of words is sometimes strange, nonsensical or contains racist, sexist and other harmful biases, one thing is very clear: the overwhelming majority of the output of these AI language models is grammatically correct. And yet, there are no grammar templates or rules hardwired into them – they rely on linguistic experience alone, messy as it may be.

GPT-3, arguably the most well-known of these models, is a gigantic deep-learning neural network with 175 billion parameters. It was trained to predict the next word in a sentence given what came before across hundreds of billions of words from the internet, books and Wikipedia. When it made a wrong prediction, its parameters were adjusted using an automatic learning algorithm.

Remarkably, GPT-3 can generate believable text reacting to prompts such as “A summary of the last ‘Fast and Furious’ movie is…” or “Write a poem in the style of Emily Dickinson.” Moreover, GPT-3 can respond to SAT level analogies, reading comprehension questions and even solve simple arithmetic problems – all from learning how to predict the next word. […]

Research published in Nature Neuroscience demonstrated that these artificial deep-learning networks seem to use the same computational principles as the human brain. The research group, led by neuroscientist Uri Hasson, first compared how well GPT-2 – a “little brother” of GPT-3 – and humans could predict the next word in a story taken from the podcast “This American Life”: people and the AI predicted the exact same word nearly 50% of the time. […]

We are not suggesting that GPT-3 or GPT-2 learn language exactly like children do. Indeed, these AI models do not appear to comprehend much, if anything, of what they are saying, whereas understanding is fundamental to human language use. Still, what these models prove is that a learner – albeit a silicon one – can learn language well enough from mere exposure to produce perfectly good grammatical sentences and do so in a way that resembles human brain processing.

For years, many linguists have believed that learning language is impossible without a built-in grammar template. The new AI models prove otherwise. They demonstrate that the ability to produce grammatical language can be learned from linguistic experience alone. Likewise, we suggest that children do not need an innate grammar to learn language.

Bathrobe also sent a link to “a very dismissive thread on the article” at r/linguistics where redditors say things like “linguists are mostly die-hard Chomskyans. We beat that strawman into submission really hard.” Well, they would say that, wouldn’t they, deluded creatures that they are?

Comments

  1. David Eddyshaw says

    learning networks seem to use the same computational principles as the human brain

    Really? They know what the computational principles used by the human brain are, now? Why was I not informed of this? Find me an underling to punish …

    Yet again, I must cite the Hausa proverb: Kama da Wane ba Wane ba: “Like John Smith doesn’t mean that he is John Smith.”

    This doesn’t actually seem to have any particular relevance to Chomsky’s errors: the system is hardly suffering from poverty of stimulus. Quite the opposite.

    Once again: a bad argument for a true proposition (here, “Chomsky is fundamentally wrong”) is still a bad argument.

  2. Moreover, GPT-3 can respond to SAT level analogies, reading comprehension questions and even solve simple arithmetic problems – all from learning how to predict the next word. […]

    Can it pass an exam and study in a university?

  3. David Eddyshaw says

    Nowadays, probably yes … (I presume it could stump up the fees …)

    Indeed, it could probably get papers published in Nature Neuroscience … seems to have the necessary skills …

  4. Why, people back in the day thought that ELIZA was pretty much indistinguishable from a real shrink…

  5. David Eddyshaw says

    Christiansen and Kallens seem in fact to be unfamiliar with the actual state of play regarding Chomskyanism (and who shall blame them? Unless, of course, they choose to write articles on the topic …)

    They refer to (presumably) the Principles and Parameters incarnation of the Great Truth, long since junked by the Great Man on the (not unreasonable, though characteristically unacknowledged) grounds that it Doesn’t Bloody Work, as if it was still current wisdom, and are evidently also unaware that the Poverty of the Stimulus thing has long since been – well – debunked, anyway. With no AI needed,

  6. cuchuflete says

    Quite interesting. If I’ve understood correctly, the next time someone exercises their “second amendment rights” and kills a bunch of schoolchildren, the local NRA-owned politician will be spared the need to intone thoughts and prayers. They can just feed the report of the massacre to the computer, and it will use its speech synthesizer to say thoughts and prayers in the voice of that very same selected official.

  7. A camera renders anatomy more precisely than Leonardo or Degas. Is a camera an artist?

  8. David Eddyshaw says

    In fairness, passing the Turing Test is probably a bad career move for a politician. It doesn’t do to frighten the horses.

    https://www.smbc-comics.com/comic/p-bot

    (Why not?)

  9. David Eddyshaw says

    It occurs to me that this experiment could in fact be interpreted as supporting what is actually the foundational Chomskyan error, viz that syntax can be satisfactorily analysed without invoking meaning.

    This experiment, where meaning has been totally excluded (the machine being altogether incapable of attaching meaning to constructions), yet the generated verbiage is apparently grammatically acceptable, seems to be a sort of proof of concept.

    In fact, it fails to prove the point because the technique is highly parasitic on the imitation of vast numbers of texts generated by agents who do associate meaning with form, and thereby smuggles an intermittent imitation of meaning into its output. Still, it gives a powerful illusion of proving the point.

    As it happens, I was just reading a Construction Grammar account of English which points out that the Poverty Of the Stimulus argument, especially with some of the particular constructions favoured by Chomskyite pushers of the argument, actually assumes that syntax is a separate thing from meaning, thereby greatly increasing the supposed difficulty of the learner’s task.

  10. Y: A camera renders anatomy more precisely than Leonardo or Degas. Is a camera an artist?

    There was a debate, after the invention of photography, as to whether a photograph could be copyrighted. Is taking a photograph an act of creative invention, or just a mechanical process? Thanks to a campaign by professional photographers, it was eventually decided that a photograph can be copyrighted.

    But not by the person who designed the camera, though.

    The research group, led by neuroscientist Uri Hasson, first compared how well GPT-2 – a “little brother” of GPT-3 – and humans could predict the next word in a story taken from the podcast “This American Life”: people and the AI predicted the exact same word nearly 50% of the time.

    This somehow reminds me of the Monty Python sketch proving that penguins are as intelligent as people, because the comprehension level of a group of penguins to a spoken passage in German was identical to that of a group of humans selected for an inability to speak German. (I think it was German.)

  11. syntax can be satisfactorily analysed without invoking meaning.

    This should at least be adequate for translating such hum-drum texts as bureaucratese. Which similarly rarely involves meaning.

    (The Decher critique hints darkly there are limits, without giving examples. My experience is ‘supporters/researchers’ (Liberman for example) have so much lowered their expectations that “can write newspaper articles” means articles no editor would dare to charge money for.)

  12. I totally agree with the criticisms voiced here.

    I did find the reddit thread pretty ridiculous, though. As the second commenter said:

    And the article they cite to summarize “Chomskyism” is twenty years old. In his more recent work on the minimalist program, Chomsky notes that “developments in linguistic theory over the past two decades have greatly clarified aspects of language’s origin. In particular, we now have good reasons to believe that a key component of human language—the basic engine that drives language syntax—is far simpler than most would have thought just a few decades ago.”

    Good reasons to believe? Chomsky only justifies his latest theory by appealing to some concept of “computationally simple”. When hard-core Chomskyans write things like “far simpler than most would have thought just a few decades ago”, I’m tempted to ask “How many times did you perform Merge when you wrote that sentence?”

    Pushback from non-Chomskyans is also satisfying. As cat-head said:

    I think there is a double standard in how models are evaluated. Mathematically informal and computationally unimplementable models are considered ‘fine’ but computational models are not considered interesting unless they fully solve language. (My bolding)

  13. maidhc: From the little I’ve read, it seems that undirected photography, e.g. by webcams or security cameras, is probably not copyrightable, by US law anyway.

  14. Y: You may be right. Copyright requires a creative step. But it may be that, analogous to a work made for hire, the output of an undirected camera belongs to the owner of the camera.

    It’s my understanding (based on something I heard on NPR) that currently the most common use of automatic text generation software is generating tedious and predictable newspaper stories such as accounts of high school sports events. These may get tidied up a bit by an editor later. I imagine copyright issues are covered in the software license.

  15. I misremembered the Monty Python sketch. The English comprehension of the penguins was compared to humans who didn’t speak English. At least one of them was German.

  16. Jen in Edinburgh says

    At least one of them was German.

    The humans or the penguins?

  17. “This experiment, where meaning has been totally excluded (the machine being altogether incapable of attaching meaning to constructions), yet the generated verbiage is apparently grammatically acceptable, seems to be a sort of proof of concept.”

    @DE, not totally. Where meanings comes from is a philosophical question, but context is at least a part of what creates meanings in our minds.

  18. What makes them say language models don’t have grammars? They’re certainly built on an architecture that has language structure in mind, and many recent studies are finding evidence of implicit compositional syntax within their parameter layers. Inductive bias doesn’t have to come in the form of trees.

  19. David Eddyshaw says

    what creates meanings in our minds

    The context doesn’t create meaning: we assign meaning to the patterns we see,

    There isn’t any way, conceptually, of getting from pure syntax to semantics: from links between signs to links between the signs and what they signify. It’s like an inhabitant of Flatland trying to move in three dimensions.

    The machine here is an inhabitant of Flatland.

  20. @DE, I don’t know what ‘meaning” means. I think we see it as anchored outside of langauge (in our mental and external realities) and yet we learn it from the context too. When you leave only contexts for all texts, the situation becomes different. True. But can we say that semantics disappears, if contexts are still here?
    And it is not just “from syntax”.
    “we assign meaning” – do we already have meanings before assigning them?

  21. David Eddyshaw says

    I think we see it as anchored outside of language

    Inevitably, surely? If no word has any real-world referent, how can language relate to anything at all, except itself? Indeed, in what sense is it even a “language”, as opposed to a (perhaps very elaborate) internal mind game of some sort?

    The machine has no access to any context whatsoever outside of the texts themselves. It is logically impossible for it to be able to relate signified to signifier: it has no knowledge of anything signified. If its agorithms are sufficiently elaborate, it may be able to imitate the behaviour of agents which do relate signified to signifier: but imitation is all that it can ever achieve. Kama da Wane ba Wane ba.

    https://en.wikipedia.org/wiki/Signified_and_signifier

    My argument does not depend on any assumptions about whether a machine is actually or potentially capable of thought or consciousness or anything like that. It would apply just as much to a human being whose entire life experience consisted of reading texts (as a result of some ghastly dystopian experiment.) It seems to overlap a bit with Searle’s Chinese Room thought experiment, but I don’t actually find that very persuasive: I don’t think Searle has ever come up with a good answer to the idea that the Room as a whole system in fact does understand Chinese, even though its hapless operator does not. After all, my individual neurons are presumably not conscious …

  22. David Eddyshaw says

    (The Chomskyan view, of course, from the outset, was that actual human language really is the same kind of thing as an abstract mathematical structure, metaphorically called “language”, where the question of external reference does not arise and the only questions are how the elements of the “language” relate to one another; it is hardly surprising that this ludicrous idea has run into the sand, only that it’s taken so long.)

  23. @DE, I need “consciousness” here, because the fact is I believe you have it, so I list it too.
    I don’t know what it is, so maybe you can take some meanings from there:)
    But I just said “mental reality” which must include many things.

    Yes, the crude model is “some words/concepts are anchored in [the source of all meanings] and others are either explicitly defined (X is Y) or less explicitly learned from contexts. Or both anchored in [the source] and in the contexts where you observe them. And there is also grammar that helps us with definitions, we learn it in a similar manner”.

    What I mean is that if you remove this external anchor, what you or your machine have is still far from “nothing”:/
    She is in the situation of a sceintist who discovered a Klingon inscription.
    How can I claim that it does not contain semantics? What does it contain then? Definititely not only grammar…

  24. Also consider a blind person. We receive a lot of data in the form of visual input and we don’t know what of it matters and what of it does not. But enough to consider a blind person to understand that this data is not critical.

    She still has many other things distinct from verbal communication. Movements and shapes, touches and smells.This is all right, but are those really building blocks that form the WHOLE of our experience, our thinking, our feelings and communication?

  25. David Eddyshaw says

    What I mean is that if you remove this external anchor, what you or your machine have is still far from “nothing”:/
    She is in the situation of a sceintist who discovered a Klingon inscription.

    I agree with your first statement: if you strip away the annoying hyperbole and the cod philosophising, what the researchers have achieved is actually quite interesting.

    I disagree strongly with your second. Your scientist is not at all in the position of the machine. She has discovered the inscription herself in some particular real world context, or heard about it from those who did. She knows that it is an inscription, and what inscriptions are. She has recognised that it is in a language, and that it is not a language that she knows. Your scenario inevitably implies that the researcher brings a huge amount of real-world knowledge to the interpretation of the text before she even begins. She will, moreover, not conclude that she has actually succeeded in interpreting the inscription until she can relate its contents to her preceding knowledge of the external world. She would certainly not regard it as “success” just to be able to imitate its contents plausibly (even well enough to fool actual Klingons) regardless of any supposed meaning (though that might be a clever thing to be able to do.)

    Also consider a blind person. We receive a lot of data in the form of visual input and we don’t know what of it matters and what of it does not. But enough to consider a blind person to understand that this data is not critical

    Smell, touch etc are also meaningful concepts only insofar as they relate to an external world. (Otherwise, one is talking about hallucinations, and even the word “hallucination” implies the existence of sensations which are, in contrast, not mistaken.)

    If anything, this is surely even more obvious than with words?

    This reminds me a bit of Wittgenstein’s celebrated Private Language Argument …

  26. Yes, this claim about Klingons was not really accurate. But I realised that I can’t accurately describe what’s wrong with it, and still illustrates the point about “far from ‘nothing'”, so I allowed myself this false comparison.

    As for blind people, what I mean is that either our inborn mental reality plays a very large role, or verbal communication itself is very rich, or, if it is “anchored” in touches and shapes, then all our conversations are made of building blocks as simple as touches and shapes.

  27. all our conversations are made of building blocks as simple as touches and shapes.

    What’s simple about them?

  28. @LH, what I meant is that is remarkable.

    Visual input is so much data every second (and you don’t even know what of it your brain processes and what ignores), that even thinking “do I mostly learn from visual or verbal input?” is very difficult. This is why I thought about blind people (also the fact that blind people are not too different from other people and don’t behave like extraterrestrials means something – but it is not why I spoke about them).
    I do not mean that tactile input is poor.

    I think all the three options are true to some extent.

  29. David Eddyshaw says

    St Ludwig’s

    https://en.wikipedia.org/wiki/Private_language_argument

    is basically an attempt to show that it is not actually logically possible that the basis of our thoughts is to be found in qualia like simple touches and shapes. (It’s an unfortunate and misleading name for the argument, really, but the argument itself seems entirely valid to me.)

    The argument also undermines the idea that the connection of signifier and signified is straightforward and unproblematic. I was thinking that this is a potential weak spot in the (generally persuasive) Construction Grammar notion that form and meaning are associated from the get-go in “constructions” which go all the way down to morpheme level. “Meaning” (as drasvi rightly implies) is not a straightforward concept at all.

    Be that as it may, that is one of the handicaps that our unfortunate machine is labouring under. It’s not capable of participating in any language game (In Witter’s sense.) And it’s not part of a language community.

  30. As a result of the above discussion about the necessity of creative input in making photographs copyrightable, I had a look at famous examples of “animal-produced art” on Wikipedia. I knew there would be ample cases of artistic monkeys and elephants, but I also found an example of a rabbit who paints.

    However, that’s not why I’m leaving this comment. I followed the Wikipedia link to the page for the “Domestic rabbit,” which begins:

    A domestic or domesticated rabbit (Oryctolagus cuniculus domesticus)—more commonly known as a pet rabbit, bunny, bun, or bunny rabbit—is a subspecies of European rabbit, a member of the lagomorph family.

    My instinct was that the inclusion of “bun” in the list of synonyms was probably a sly reference to the use of that term in xkcd.

  31. the Principles and Parameters incarnation of the Great Truth, long since junked by the Great Man on the (not unreasonable, though characteristically unacknowledged) grounds that it Doesn’t Bloody Work

    Actually, uptake of the new Minimalism appears to be rather poor; Principles and Parameters still seems to be the standard for many, being the last incarnation of Chomskyan theory that has a body of concrete analyses that can actually be referred to. Minimalism is just a “program”.

  32. David Eddyshaw says

    Good grief, it was announced by the Fount of Wisdom Himself in 1993!

    Get with the Program, people!
    Publish those insights!

  33. Thirty years is not too long for a revolution to gather pace. I’m only afraid Noam will be pushing up daisies by the time his program comes to fruition….

  34. I would like to see automated text generation try its hand imitating this type of dreary polemic. It all looks the same anyhow.

  35. David Eddyshaw says

    It’s what Orwell calls “duckspeak” in Nineteen Eighty-Four.

    Interesting standing locution “communists and workers.” Presumably this is supposed to be construed like “lord and master” with “and” joining two words referring to the same entity, but the implicature for an English L1 speaker is that the communists and the workers are distinct groups. Heresy!

    The title “Information Bulletin” reminds me of the old joke that there is no pravda in “Izvestiya”, and no izvestiya in “Pravda.”

  36. Presumably this is supposed to be construed like “lord and master” with “and” joining two words referring to the same entity

    Not at all. To be a communist, it is not enough simply to be a worker; you must be a “conscious” worker with enough партийность (to use the expressive Russian term) to deserve and be granted the infinitely desired Party card (the loss of which is so catastrophic in Simonov’s novel). And of course not only workers can be communists; you have forgotten the sacred union of workers and peasants, comrade! I am afraid some reeducation will be required…

  37. David Eddyshaw says

    Ah! You passed my little test, Hat! For a moment I thought I might trap you into expressing bourgeois revisionism …

  38. Trond Engen says

    David E.: The machine here is an inhabitant of Flatland

    Which one?

  39. David Eddyshaw says

    The question here is: do Norwegians have the capacity to assign meaning to utterances?

    (I leave, for the present, the question as to whether Norwegians are conscious.)

  40. languagehat: …you have forgotten the sacred union of workers and peasants, comrade!

    But that is just another kind of cant. Taken as an exclusive or, it literally implies that peasants don’t work! (For similar reasons, I don’t use the term “working class,” as it implies that the upper middle class doesn’t work.)

  41. Stu Clayton says

    The question here is: do Norwegians have the capacity to assign meaning to utterances?
    (I leave, for the present, the question as to whether Norwegians are conscious.)

    Sometimes they are, sometimes not – like everybody. To sleep, perhaps to dream of assigning meaning !

  42. David Eddyshaw says

    Point …

  43. Trond Engen says

    David E.: The question here is: do Norwegians have the capacity to assign meaning to utterances?

    Good question, but maybe we should narrow the scope. The toponym Flatland is mostly confined the region of Telemark, which was still a blank slate in the 16th century, so all meaning must have been assigned later.

  44. David Eddyshaw says

    Ah. So it would have been the Danes that assigned the meaning, then?

    https://en.wikipedia.org/wiki/Norway#Union_with_Denmark

  45. Trond Engen says

    Norwegian national romanticism, rather.

    One wonders why. But i’ll leave that to the teleologists.

  46. it literally implies that peasants don’t work!

    Nor did they! In pre-industrial times, peasants would rise late, linger over an ample breakfast of bread and jam, wander out to say hello to the cows and sheep and goats, then return by early evening to indulge in bucolic festivities of singing and dancing, before heading off to the pub to get sloshed on beer or cider. Such was life in the pre-modern age.

  47. Even if their choice of words is sometimes strange, nonsensical or contains racist, sexist and other harmful biases, one thing is very clear: the overwhelming majority of the output … is grammatically correct.

    Proven: the AI is Republican! (It is mean, but I think sitll funny; be sure for your mental health to change some key words to make it a Democrat. Disregard if you are not a USian)

  48. David Eddyshaw says

    Insofar as the notion of “grammatical correctness” is at all objective, is it not (ipso facto) a Radical Socialist concept?

    Real Republicans know that truth is entirely a matter of the Will to Believe.

  49. No, I haven’t seen ‘conservative’ publications complaining about a bias in Google images.

    P.S. “nonsensical, but grammatically correct” is indeed applicable to political discourse in general.

  50. January First-of-May says

    One wonders why. But i’ll leave that to the teleologists.

    Are teleologists the people who study Telemark, or is Telemark the place where teleologists live?

    P.S. “nonsensical, but grammatically correct” is indeed applicable to political discourse in general.

    Chomsky himself, reportedly, is not too far from this; cf. the Chomskybot.
    (I forgot: did he become a politician before or after he became a linguist?)

  51. Telemark is skiing style that can still to an extent be praciced with NN bindings (but not with those stupid modern bindings…). But of course the is a specialised binding too.

  52. Trond Engen says

    All modern bindings are specialized, but some are more specialized than others. Modern cross-country bindings are mostly useless outside groomed tracks, but that’s OK, for so are modern cross-country skis. I have a pair of fjellski (back country skis) with wooden core and curved steel edges (51-60 mm wide), bindings like this (except the cable), and boots like this. The main use is for off-track cross-country skiing, but they work decently for Telemark as well, at least in natural snow outside of the prepared slopes. In slopes you’ll need better skills than mine. When I go cross-country skiing in well-groomed trails, the bindings and boots rub against the sides of the track, so I got myself a pair of trail skis with a modern binding as well.

  53. Trond Engen says

    Telemark style skiing is named for the pioneers of modern skiing from Telemark. Slalom (Norw. slalåm) is a compound of Telemark dialect words (maybe even coined by these pioneers). The last element is låm “track left by a pulled or gliding object”. The first element is said to be the adjective sla(d) meaning “weakly sloped, almost flat”; which bothers me, and I suspect it could instead be slag in the sense “fold; straight movement between sharp turns”, so “zigzag track”.

  54. Trond Engen says

    Teleology is the study of Telemark. The people living in Telemark are Telemarketers.

  55. All modern bindings are specialized

    For a moment I thought you were talking about generative grammar

  56. Stu Clayton says

    All modern bindings are specialized
    For a moment I thought you were talking about generative grammar

    I made a WiPe-mediated attempt to discover what that’s all about. I found:

    A governs B if and only if

    1. A is a governor and
    2. A m-commands B and
    3. no barrier intervenes between A and B.

    and

    A barrier is any node Z such that

    1. Z is a potential governor for B and
    2. Z c-commands B and
    3. Z does not c-command A

    and

    An element α binds an element β if and only if α c-commands β, and α and β corefer.

    No thanks. Such use of notation to simulate clarity is just plain tacky.

    I am reminded of RL Moore’s Axiom 1:

    Axiom 1: There exists a sequence G(1) G(2), G(3), … such that (1) for each n, G(n) is a collection of regions covering S [the set of all points], (2) for each n, G(n+1) is a subcollection of G(n), (3) if R is any region whatsoever, X is a point of R and Y is a point of R either identical with X or not, then there exists a natural number m such that if g is any region belonging to the collection G(m) and containing X then g is a subset of (R-Y)+X, (4) if M(1), M(2), M(3) … is a sequence of closed point sets such that, for each n, M(n) contains M(n+1) and, for each n, there exists a region g(n) of the collection G(n) such that M(n) is a subset of g(n), then there is at least one point common to all the point sets of the sequence M(1), M(2), M(3) …

  57. @Trond, yes. It seems there is a misconception that one of those bindings is “old” and the other is “modern”, but the “modern” ones (likely) increase your speed on a prepared trail and are less convenient when you need to make turns.

  58. or maybe I just never tried “SNS pilot, SNS–BC, NNN–BC” with appropriate books from here.

    P.S. booTs (usually I make this typo in books.. booBs. Sigh).

  59. Lars Mathiesen (he/him/his) says

    @Stu, I’m sure you could package all that in a single concept and be much briefer. Like the Axiom of Induction: “An inductive set exists.” Sure, but you have to set up all sorts of machinery to interpret it, and getting from there to “finite induction works” is not trivial either, but that’s what people want it to mean.

  60. Stu Clayton says

    @Lars: the WiPe article from which I took those “definitions” gives no intuitive motivation. I’m fed up with notation-shuffling. I liked it many decades ago. I’m also put off by concept-shuffling in philosophy. Today I decided that “necessary” is yet another concept I don’t need.

    Moore in his book gives no motivation for his Axiom 1. He was infamous for refusing to give explanations. You had to find your own.

    I took a topology class with him because everyone said he wouldn’t let me in, so I lied about my age and was accepted.

  61. @DE, are “sings” are so different from “signs, touches and shapes”?

    We can, of course, decide that we consist of two subsistems, one processes “touches and shapes”, one processes “words” and meanings are arrows from words to senses. We can even declare that “touches and shapes” are what we actually think about, while “words” is just an arbitrary extra layer (and it is arbitrary in that words are different in different languages).

    The problem is that whether you input wholly consists of texts or it also contains touches and shapes, both times it is “some data”….

  62. David Eddyshaw says

    So your idea is that the machine develops “meaning” by associating text items with other text items?

    That would mean that there was still an unbridgeable gap between its universe of “meaning” and the actual non-textual world; but, more importantly, the point of the Blessed Ludwig’s Private Language Argument is that your scenario (that we ourselves assign meaning by, as it were, creating arrows from words to sense inputs) is logically impossible. (This is basically the “picture theory” expounded so beautifully in the Tractatus; Wittgenstein’s later work is basically all dedicated to showing that it doesn’t work.)

    https://en.wikipedia.org/wiki/Picture_theory_of_language

  63. Rather that I am not sure that lack of input other than verbal means that “meaning” is meaningless for our machine (if it is meaningful at all, it is difficult to discuss “meaning”).

    That the system “AI and a text” is incomplete or that the missing component is touches and shapes and not something else.

  64. Denying that “meaning” exists other than as interrelationships among verbal tokens seems akin to denying that consciousness exists — a pointless game for philosophers.

  65. If I understand correctly, the situation currently is that AI can learn to produce a text about cats when prompted the word “cat”, and it has learnt to produce such texts in a way that fits with how people write about cats. But while it can write “The boy fed the cat this morning”, it has no concept of boys, or cats, or feeding, or mornings, it only knows that these words can be combined in a way that look like the texts it has learnt on.
    But would you say that a system that could observe a boy feeding a cat in the morning and, based on that observation, would produce that sentence later in the day had a concept of meaning?

  66. January First-of-May says

    “Здравствуй, здравствуй, кот Василий,
    Как идут у вас дела?”
    Дети козлика спросили…
    Зарыдала камбала.
    И малюткам кот ответил,
    Потрясая бородой:
    “Отправляйтесь в школу, дети!
    Окунь плачет под водой.”

    [from Shefner’s Girl by the Cliff, depicting the in-story state-of-the-art situation of AI verse writing in the early 22nd century]

  67. David Eddyshaw says

    But would you say that a system that could observe a boy feeding a cat in the morning and, based on that observation, would produce that sentence later in the day had a concept of meaning?

    I think the key question there is not the text production, but the “observe.”
    If the machine could reliably identify

    (a) that there was a young human male involved (not a young girl, not a chimpanzee, not a consultant ophthalmologist)
    (b) that there was a cat involved (not a ferret, a genet or a foxcub)
    (c) that the boy was actually causing the cat to eat (and that that is a thing cats do, at least with some things, though not others)
    … “in the morning” is dead easy for our machine, of course …

    and could consistently perform similar but different feats (say, identifying correctly that an old man was watering plants in the afternoon) so that this was not a matter of merely fitting whatever was happening into a limited set of preset tick-box categories of possible scenarios), then I’d say Yes (being, as I said above, unimpressed by Searle’s Chinese Box argument.)

    Note that huge amounts of real-world background information would be necessary in advance for the machine to make these “observations” correctly.

  68. David Eddyshaw says

    (c) is the difficult one, I think (after all, even mobile phone apps are quite good at identifying things as being cats.) Along with the requirement for being able to perform other feats of a similar but different nature, which, to be honest, is a way of smuggling in a requirement that the system actually be intelligent, in the normal sense, as opposed to the watered-down pretend marketing-slogan sense used in modern “Artificial intelligence.”

    During the Heroic Age of AI, before the current Age of Cheating with Statistics, quite lot of work went into working out how a machine might interpret situations by, for example, comparing them with various prescripted situations. It turned out to be horribly impractical for all but toy problems.

    It’s analogous to the way that attempts to do machine translation by teaching machines about syntax have been comprehensively overtaken by brute-force statistical methods involving training on huge amounts of input, and to hell with getting machines to “interpret” anything. In practical terms, the results are much more impressive, but the philosophical questions about what is really going on have merely been solved by the time-honoured method of pretending that they don’t exist.

  69. David Eddyshaw says

    It’s conceivable, I suppose, that we ourselves parse sentences in the same way that these neural networks do, a claim artlessly implied by TFA’s “learning networks seem to use the same computational principles as the human brain.” I prefer to think that meaning enters into my own language-processing strategies, but I only say that because I am a philosophical zombie and I don’t want to feel left out. (Humans can be so cruel about us.)

    https://en.wikipedia.org/wiki/Philosophical_zombie

  70. @DE: Now we have a set of criteria. Only one remark on this:
    that the boy was actually causing the cat to eat
    I’d be wary of bringing causality into this; philosophers might accost you in a dark alley…
    And for me, it would be sufficient to observe that the boy puts food where the cat is supposed to eat it, whether the cat accepts the offering or not; I would be confident in answering “yes” to the question “Did you feed the cat?” even in that case.

  71. David Eddyshaw says

    What I mean (if i, given my ontological status, can truly be said to mean something) is that the correct use of the word “feed” entails a particular interpretation of the situation. For example, if the boy dropped the food accidentally, and the cat leapt on it, did the boy feed the cat? If the boy dropped the food, and the cat happened to find it an hour later?

    The details in this particular case are not vital: it’s easy to come up with other apparently unproblematic descriptions of what is in front of one’s very eyes which in point of fact entail whole layers of assumed classification of, and identification of, participants and causal relationships, to say nothing of the background knowledge needed to make the scene describable in precisely that way at all.

  72. David Eddyshaw: It’s analogous to the way that attempts to do machine translation by teaching machines about syntax have been comprehensively overtaken by brute-force statistical methods involving training on huge amounts of input, and to hell with getting machines to “interpret” anything.

    Similarly, the key conceptual development the made it possible for computers to beat the best humans at chess was giving up on trying to teach the computers how to play chess. People like Hans Berliner (who had been a world champion in correspondence chess) tried to teach computers to play the way they themselves played. The Deep Thought team at IBM instead hit upon the strategy of teaching the computer how to learn to play chess, then feeding it a huge number of grandmaster games to learn from.

  73. David Eddyshaw says

    I wonder if Google has an in-house Philosophy Team? (It should have a quick-response unit for dealing with ontological emergencies.)

  74. Trond Engen says

    I’d prefer a paramedical unit staffed by the Department of Clinical Ontology.

  75. “Excuse me, ma’am, I’m an ontologist; what is the nature of your current problem with Being?”

  76. David Eddyshaw says

    “I think, Doctor, but I’m not.”

  77. If an ontologist would speak, could we understand him?

  78. Trond Engen says

    “Doctor, I have no continuity of self.”

    “The treatment will make you a selection of episodes with a loose thematic connection.”

    “That isn’t really helpful, doctor.”

    “There’s only so much antology can do.”

  79. January First-of-May says

    “I think, Doctor, but I’m not.”

    A common problem over on one of my Discord servers, where the newly formed headmates frequently doubt whether they in fact exist.

    (Continuity-of-self troubles are also fairly common. Comes with the DID.)

  80. January First-of-May says

    The title “Information Bulletin” reminds me of the old joke that there is no pravda in “Izvestiya”, and no izvestiya in “Pravda.”

    This, and (later) the recent discussion in the phrasebook thread which also touched on the same words, reminded me of Tom Lehrer’s Lobachevsky, where a scene near the end goes “Pravda said [funny but irrelevant Russian phrase] – it stinks, but Izvestiya said [another funny but irrelevant Russian phrase] – it stinks”.
    And I thought, rule-of-funny Russian phrases aside, it seems clear that Lehrer expected his presumably-American audience to recognize the names “Pravda” and “Izvestiya”.

    But how? Where would Americans in the 1950s (or ’60s or ’70s for that matter) have heard the names of the Soviet newspapers? Presumably Soviet newspapers did not actually publish in the USA. Would there be sufficiently frequent reports of Soviet news attributed to those newspapers?
    Or is it some kind of TV thing? Having not lived through the US side of the Cold War – or the Soviet side either, technically – I have no idea where it could ever come up, but presumably it somehow did or the reference would have fallen flat.

  81. – “Правда” есть
    – Нет.
    – “Россия” есть?
    – “Россию” продали.
    – А что осталось?
    – “Труд” за три копейки…

    “Россия” – Советская Россия.

  82. Would there be sufficiently frequent reports of Soviet news attributed to those newspapers?
    I don’t know about the U.S., but in German news broadcasts and newspaper articles it was usual to quote Soviet sources for news from the USSR in the 70s. Frequently enough that even as a teenager I was familiar with names like Pravda and TASS.

  83. I don’t know about the U.S., but in German news broadcasts and newspaper articles it was usual to quote Soviet sources for news from the USSR in the 70s. Frequently enough that even as a teenager I was familiar with names like Pravda and TASS.

    Same in the US; I was familiar with all those names in the ’60s and maybe even in the ’50s (I say “maybe” because I was just a kid and didn’t pay as much attention to the news; I’m sure my parents knew them).

  84. David Eddyshaw says

    Same in the UK too.

  85. Trond Engen says

    And in Norway. I knew Pravda and TASS from a very early age. NPABAA and TACC even earlier.

  86. Бравада “bravade” would make a good name for a newspaper…. Morning Bravade.

  87. This has been niggling at me for a long time.

    (David Eddyshaw on Chomskyan theory) Doesn’t Bloody Work

    For me, this is grammatically substandard. In good English we need an adverbial form here, not an adjectival one. ‘Bloody’ is an adjective; ‘bloody-well’ is the adverbial form. The correct form is:

    Doesn’t Bloody-well Work

  88. David Eddyshaw says

    I was quoting a member of the Royal Family!

    None of your antipodean republicanism here!

  89. PlasticPaddy says

    May I speak in support of the August Personage but note that a participle, i.e., bleeding/fucking could be effectively substituted for “bloody”. The construction [NEG AUX VERB] [EXPL ADJ] [INFINITIVE] has an emotionally intensified force which would not be the case if [EXPL ADJ] were to replaced with an adverb like really/unfortunately/actually or a salutation (after the infinitive) like old girl/man/chap/boy.

  90. Antipodean republican, huh?

    Well how do you explain this? Bloody Well Right

  91. David Eddyshaw says

    Creeping Australianisation, I calls it.
    I blame Neighbours.

  92. John Cowan says

    Doesn’t Bloody-well Work

    That, comrade, is Oldspeak. In Newspeak one says “Doesn’t Bloodywise Work”, as “-wise” is the universal termination for adverbs.

    Sign: “One man, one vote.”

    Worker A: “What does that mean now?”

    Worker B: “Why, it means, ‘one bloody man, one bloody vote’. See?”

  93. “Pravda said [funny but irrelevant Russian phrase] – it stinks, but Izvestiya said [another funny but irrelevant Russian phrase] – it stinks”.

    Why irrelevant? The first was the first line of the Song of the Flea and the second was “that I should go where the czar himself went on foot” (with mild mistakes, I leave the meaning as a riddle here)

  94. PlasticPaddy says

    Irrelevant = “average listener does not know any phrases in Russian apart from do svidanja or na zdorovje”

  95. January First-of-May says

    Why irrelevant?

    Irrelevant in context: it’s a real Russian phrase but it has nothing to do with the context or the purported translation.

    (I’ve actually read somewhere that when Lehrer performed that song to an audience that would be expected to actually know Russian he substituted those phrases with gibberish – they weren’t intended to be understood.)

  96. he substituted those phrases with gibberish

    Interesting. But he did choose bits of Russian which make sense in the context if you understand them.

  97. January First-of-May says

    But he did choose bits of Russian which make sense in the context if you understand them.

    Only way I could think of remotely quickly that it would work – and it took me several minutes to even think of that option – is that it’s trying to imply that the newspapers are also plagiarizing their reports and trying to pretend that the result makes sense.

    I guess there could be some other way I’m missing in which it’s somehow legitimately relevant. It’s a little less completely absurd because 1) in context it’s a direct quote and those can be weird, and 2) it’s in the right language to be a direct quote from the specified source, which already counts for a lot.

  98. January First-of-May says

    I’ve actually read somewhere that when Lehrer performed that song to an audience that would be expected to actually know Russian he substituted those phrases with gibberish

    I’m not sure where I’ve read that (it was well before 2022), but Lehrer’s own version of the lyrics on his website has the following description:

    “At each of these two junctures one should insert some phrase in Russian (if the audience does not speak Russian) or some Russian double-talk (if it does). The author’s own choices varied from performance to performance, ranging from the merely inappropriate to the distinctly obscene.”

    I don’t recall having ever heard a version that didn’t match the sentences described in the corresponding Wikipedia article (namely the Song of the Flea and the riddle about the czar), but it’s possible that the online versions are mostly derived from just a few performances that happened to use those.

  99. I think there are only two recordings of Lehrer performing “Lobachevsky.” One is the original 1953 studio version. The other is a live 1960 concert recording. He didn’t perform it at his 1967 concert filmed in Copenhagen, and there just aren’t a lot of other recordings of him, period. In Lehrer’s whole career, he played less than a hundred shows.

  100. RL Moore’s Axiom 1

    It’s stuff like this that makes me have no use for axiomatic logic, whereas I love natural deduction, in which there are no axioms, just the rule of inference called (by Hofstadter) the Fantasy Rule, informally: “If we assume A is true, and on the basis of that and everything else we know we can prove B, then ‘if A then B’ is a theorem.” Truths bootstrapped out of nothing.

  101. David Eddyshaw says

    natural deduction, in which there are no axioms

    Not so fast:

    https://en.wikipedia.org/wiki/What_the_Tortoise_Said_to_Achilles

  102. My response to the Tortoise is basically to short-circuit him at the first step by saying “If you reject the Fantasy Rule, you’re an idiot.” Consider Franklin (an imaginary person), who rejects the Conjunction Rule (“If A is a theorem, and B is a theorem, then ‘A and B’ is a theorem”). He too is an idiot. More politely, Franklin and I have (at best) different understandings of the word and. In either case there is nothing to talk about. Similarly, the Tortoise is not going to actually reject, the fantasy rule, but he is going to postpone its application indefinitely, which is in itself a kind of rejection. He pretends that he doesn’t know what “if … then” means, when of course he does.

    Formally, the rules of inference are just as abstract and “undefined” is the axioms in axiomatic logic. But the interpretation schema for them is a great deal more intuitive, and that’s a Good Thing. It’s nice to have a machine for cranking out theorems (it can’t find them all, but nothing can). However, it is good if each step in the process makes sense to us, rather than dragging in incomprehensible axioms and leaving us bewildered.

    On a historical note, Hofstadter first gives us this dialogue, and then proceeds to use natural deduction in all his following dialogues. Crafty man.

Speak Your Mind

*