David Oks’s essay on citations is not central to my interests, but I know there are lots of Hatters who do science and will probably have things to say; I myself found it extremely enlightening. I’ll quote the start and let you click through for the rest:
Here are a few headlines from the world of science. […] So scientists are submitting AI-generated papers; reviewers are using AI to assess them; obviously some amount of low-quality AI-generated content will end up getting approved and published. Well-regarded journals have been caught publishing papers with classic ChatGPT-isms like “here is a possible introduction for your topic” or “as of my last knowledge update.” But that’s not all. Many of those AI-generated papers are being cited by articles in other peer-reviewed journals: and many of those articles, unsurprisingly, appear to be AI-generated themselves.
It’s pretty well-known now that science is “drowning in AI slop.” In that regard, it’s not alone: AI slop is steadily infiltrating every school and workplace in the country. But there’s something about all of this that puzzles me.
I get why students, for example, would want to avoid doing homework. But I don’t really understand why scientists would want to avoid doing science. Or, rather, why they’re so eager to use AI to produce a huge number of shoddy papers. No one forced them to become scientists. I imagine that most people who work as scientists chose to do so out of something like love for the subject. So why are scientists using AI to produce and submit so much garbage?
I don’t think that the answer actually has much to do with AI. It has to do, instead, with the incentives that govern scientific institutions. You could boil it down to one word: citations.
Over the last few decades, science has undergone a “citation revolution.” Scientific life used to be structured by personal reputation and mutual acquaintance; now it is defined by quantitative assessments derived from citations.
And this reward system has warped scientific life in dramatic ways. It has resulted in the obvious and widespread gaming of citation metrics; but, more insidiously, it has pushed scientists toward risk-averse, incremental, and above all unambitious research. The logic of institutional science has become increasingly divorced from actual knowledge and discovery. In a system governed by these perverse incentives, the inevitable endpoint is simply AI-generated slop at scale. […]
But we should start, first of all, with a moment very much like our own, the origins of the citation revolution: the “information crisis” of the 1960s.
He talks about the idea of precedent, and says:
In the 1870s, a salesman of legal books named Frank Shepard realized that this represented a good business opportunity. Lawyers always needed to trace the subsequent history of a ruling. So Shepard started producing books with gummed strips of paper, listing every subsequent case that cited a given decision. With Shepard’s books—called Shepard’s Citations—you could quickly learn whether a given case was still good law. Shepard’s innovation was tremendously successful. It did so well, in fact, that his name became a verb: “to Shepardize” meant to consult Shepard’s Citations to check on the status of a precedent.
In 1953, long after Shepard had died, a retired vice president of Shepard’s Citations named William C. Adair, living at his ranch in Colorado Springs, was reading a newspaper article about scientific documentation. Science, the article said, was “swamped in a sea of literature,” and a group of researchers at Johns Hopkins wanted to see how machine methods could fix that.
Adair’s curiosity was piqued. The answer seemed obvious. Why not just apply citation indexing to science?
There’s lots more, and I knew nothing about any of it.
Just: no. There are no “AI scientists”, and we are not heading to a world with any, let alone “millions.” And they wil not “eventually own such integrally ‘human’ parts of the scientific process as hypothesis generation.” This is just swallowing the marketing hype.
The article is correct that “AI” is (as in other areas) exposing serious defects in the existing knowledge ecosystem; as well, of course, as grievously polluting it further.
“Bibliometrics” is what suits prefer to actual judgment of quality, and its proliferation is due to politically driven changes by which university education has been reframed as mere technical education, worthy of state support only as far as it can be shown to lead to more or less immediate financial benefits.
Nature shows some chutzpah in publishing a paper complaining about this pollution of the noosphere. As a publisher, they’re enabling it. For money.
There are no “AI scientists”
Yes, that was poorly expressed, but I don’t think it’s central to his point.
No, he means it, as his further remarks clearly show. It’s not just a clumsy statement. He really does think that LLMs will (eventually) be able to do real science, not just imitate academic papers well enough to get published in by Nature. This is pure fantasy.
“AI” merely exposes the existing perverse incentives here; about those, I do largely agree with Oks: though his focus on “citations”, specifically, as the problem strikes me as peculiar. There’s nothing wrong with citations: it’s the whole political environment that academia is having to function in that’s the problem.
Oks seems to be a political activist (on the side of the angels) rather than someone with any particularly relevant scientific experience.
https://en.wikipedia.org/wiki/David_Oks
Ah, I didn’t check on his background — thanks.
That said, what I found interesting was not the AI catastrophizing (which there’s no shortage of) but the history of citations and how they migrated from law to science.
Ah, the legacy of Frank Shepard! I was in the very last cohort of American lawyers (the generation admitted to the bar in the early/mid Nineties) who had learned how to Shepardize cases the old-fashioned way with hard-copy reference volumes, but soon enough most of the Powers That Be were sufficiently convinced that the process had been automated by online software to a sufficient degree of reliability that they became eager to stop paying for constantly updated sets of hard-copy reference volumes – along with the more general sense that your office rent could be spent more productively if fewer square feet were devoted to an analog law library with shelves full of hard-copy volumes.
I personally don’t think the way in which earlier judicial decisions (in an Anglo-American system) relate to more recent judicial decisions is a particularly compelling analogy for the way earlier scientific papers relate to more recent ones, but no one in the science biz asked me about that.
I had heard of “shepardizing” a long time ago and have forgotten about it until now. I never knew before what it meant.
Re “why scientists would want to avoid doing science,” I should think it would be pretty obvious that the pro forma “literature review” part of a published scientific article that is often expected by genre conventions is not what many working scientists consider the fun part of doing science or perhaps what they even subjectively think of as “doing science” at all. Outsourcing that boring genre-convention part of the finished project to a hallucination-prone chatbot seems to me a conceptually different sort of shortcut (not least in terms of who might be tempted to do it) than actually faking data or faking the mathematical analysis thereof.
Yes. Then they get one of the many, many positions for doctoral students. And then, once they have their doctorate, many of us get nothing, because there aren’t enough jobs. I can rattle off a list of people in my narrow field who did great work and then had to drop out for that reason. Plus one colleague who did get positions, one time-limited postdoc position after another, nine in a row, who gave up after the ninth because there still wasn’t a permanent position available. And myself, I was (marginally) gainfully employed for half a year because that was a prize I won, not a position I got.
I stress that I don’t mean there are too many people in science. No, the opposite: there still aren’t enough. The amount of science that remains to be done and can be done with currently existing methods and currently available data – the known unknowns – is enormous. There are not too many people; there are too few jobs.
We’ve previously talked about the strain of bacteria that can tolerate extremely high arsenate concentrations and was, in a fit of wishful thinking, misinterpreted as being able to use arsenate instead of phosphate (which is chemically impossible) and published in Science, the second most prestigious applicable journal, under the name GFAJ-1, which means “give Felisa a job”. Not fame, not riches, not something abstruse like revenge – a job. That’s how desperate we are. We try to believe we’ve found a literal miracle and hope it’ll give us a job.
The introduction of the impact factor was widely taken, including by scientists, as a great improvement: instead of being hired for “personal reputation and mutual acquaintance” as the article puts it, we were going to be hired for actual qualifications – for what we knew instead of who we knew. No more old-boy networks! Or at least fewer of them and less blatant ones…! (How bad it is varies among countries etc. etc., but there absolutely are still cases where an institution has its candidate first, then tailors the job ad to that candidate, publishes it, and to everybody’s surprise finds the only candidate who fits it is the one it was written for. I’m quite sure I’ve uselessly applied to a bunch of those.)
The portrayal of how the impact factor of journals changed from a measure into a perverse incentive is accurate. 15 years ago I had to put the impact factors of the journals I had published in into my publication list for one particular application.
The portrayal of how the impact factor was replaced by actual counts of citations, plus the h index (and, not mentioned, the i10 index), leaves out that this, too, was widely seen as an improvement: no longer could bad scientists piggyback on the good papers by other people in the journals they happened to publish in (…or published in because they knew the editor…), no longer could they be dragged down by other people’s bad papers in the journals they happened to publish in. The portrayal of how that turned into a perverse incentive, however, is accurate.
Then it all falls apart. More quotes from the article:
Awards? Seriously? How many awards you get correlates with which, and how many, societies you have joined. (And, for students, which country you’re in.) At worst, they’re a popularity contest – a measure of personal, not of scientific, reputation (two things I note Oks seems not to distinguish at all).
It gets worse. Applications for most grants, especially the larger and more prestigious ones, are only accepted if they’re “innovative”, meaning: using the latest toys to follow the latest fashions.
For a decade after 9/11, all biologists (in a wide sense) who were applying to the NIH or the NSF twisted themselves into knots so they could squeeze the word bioterrorism into their grant proposals.
Not at all. Not remotely. We’re living in the Shiny Digital Future; if your field isn’t particularly gigantic, it is pretty hard to overlook any new paper for long.
…except in historical linguistics, where there are still, in this year 76 After Present, journals that don’t have an online presence beyond “yep, the journal exists, write here to subscribe to some dead trees”.
Short answer: No.
Long answer: Not in the foreseeable future.
There is indeed wet-lab benchwork that can be automated (add 33 microliters of a new substance to samples 1 through 25 million and measure how much they change color), and some of it already has been (people have built robots that do such things). AI – not LLMs! – is in current use for huge-throughput image recognition; I haven’t done it myself, but I know people who work on sand grains – radiolaria in particular – and have done that. But the rest of the paragraph, in particular “synthesize and analyze literatures”, is about as close as Lt. Cmdr. Data.
I don’t understand what, if anything, “legitimacy” means here.
And there the article ends, without any hint of what “something better” might look like.
I have an idea, though:
Create jobs. Create minimal researcher jobs that pay the cost of living plus research trips or lab access and, say, two conferences a year – no career, no riches, no fame that pays. Make them lifelong* under reasonable but low conditions (say, output of a certain number of peer-reviewed pages per year averaged across five years, plus entry conditions like a PhD). You will be fucking astonished by how much the quality and, minus AI slop and Least Publishable Units, the quantity of published research is going to increase.
Or, y’know, universal basic income, but I suppose that’s a step or two harder to imagine for too many people.
* Retirement optional. Scientists who don’t get dementia can generally work till they drop. Many would happily do so – indeed there used to be a whole phenomenon of scientists who reached the local retirement age moving to the US because retirement is optional there.
~~~~~~~~~~~~~~~~~~~~~
No, the analogy is “find all papers/cases that cite this paper/case”.
From the comments:
Yep.
It’s called tennessine now.
I suppose that’s what you get for making admission competitive.
Open access; check out Corollaries 5 and 6.
I think that stewardship of the system is the most important element. Writers who submit scientific papers on experimental results etc. will/should be reviewed by reviewers who are competent and can ferret out the AI from the real submissions. I’m not saying that some/many of the papers won’t be assisted by AI, but rather that competent reviewers will be able to sort the wheat from the chaff. As to AI reviews of scientific papers, they need to be checked for veracity by other editors/reviewers.
And the reviewers need to be not only able but willing to spend the time and effort to enforce high standards, even when there are a lot of shoddy papers of one kind or another because generating them is easy.
DM: Thanks for those knowledgeable and informative comments; it was worth posting what seems to be a half-baked article to get them. And you’re right, of course: more jobs!
@DM [quoting from comments] Back in 2005, John Ioannidis published an article …
I’m afraid for me Ioannidis has cooked his goose. He turned out to be a COVID-denier, using very similar ‘proofs’ to those in his 2005 article. (He has since conceded he was over-cautious, but that doesn’t let him off the hook: there’d have been many more avoidable deaths if governments had listened to him. And of course various anti-public-health loonies ran with the early headlines from a “highly cited medical researcher”.)
My take: Ioannidis uses “… are False” to mean not proven beyond a reasonable doubt. This is not the same as ‘proven False beyond a reasonable doubt’. Inevitably with leading-edge research (especially using small samples) there’s a large margin for error/more research needed yada yada.
Of course bogus AI-generated citations neither prove nor disprove anything. Getting a paper zombie-cited in a gazillion online comments doesn’t make it true either.
Create minimal researcher jobs that pay the cost of living plus research trips or lab access and, say, two conferences a year – no career, no riches, no fame that pays. Make them lifelong* under reasonable but low conditions (say, output of a certain number of peer-reviewed pages per year averaged across five years, plus entry conditions like a PhD). You will be fucking astonished by how much the quality and, minus AI slop and Least Publishable Units, the quantity of published research is going to increase.
That’s an interesting suggestion, but it seems to me that it just provides an incentive, though a lower one, for quantity over quality. And how is lab access going to work? Is the government going to provide all the equipment and supplies requested by anyone who has the qualifications?
there absolutely are still cases where an institution has its candidate first, then tailors the job ad to that candidate, publishes it, and to everybody’s surprise finds the only candidate who fits it is the one it was written for. I’m quite sure I’ve uselessly applied to a bunch of those.)
That’s by no means restricted to science, of course. In America we may say the job is “wired”. Other terms are probably available. I remember one candidate whose first question was whether the job was wired and he was wasting his time. It wasn’t, but if it had been, no one would have told him.
I had this with the first interview I had when applying for NHS consultant posts.
I had discreetly enquired about possible inside candidates before applying, but there had been no helpful information; I only realised that I’d wasted my train fare when actually talking to the other candidates pre-interview.
However, it was a surprisingly encouraging experience, because it took the interview board two hours of debate to agree to appoint the insider rather than me.
the pro forma “literature review” part of a published scientific article that is often expected by genre conventions is not what many working scientists consider the fun part of doing science or perhaps what they even subjectively think of as “doing science” at all.
I can’t speak for other domains, but in linguistics I would consider such an attitude blinkered, not to say arrogant. The literature review is not meant to be a pro forma exercise in citing papers you never wanted to read; it’s supposed to be a record of you learning more about the subject you’re discussing and its broader context. Of course, relevant prior work may turn out to be terrible – but that’s case by case.
All this confirms an observation from Economics and Management theory – when you base incentives and decisions on promotions on an indicator, it will be gamed and become useless for measurement. There is a limited period until a significant number of players has found how to game the indicator; after that, you have to find a new one.
@Lameen: I think the arrogant (but not bad-faith) attitude might be something like “of course I know the relevant prior literature because I keep up with all relevant stuff in my field as it happens and all of what one could usefully learn from that was already conveniently arranged in my subconscious and taken into account when I embarked upon this particular bit of research — but writing all of that out in the conventional form doesn’t itself add anything to my understanding and is thus a waste of my valuable time.” One of the arguments in favor of the convention, of course, is that sometimes you don’t actually know the relevant prior literature (or don’t have it organized in your mind in the most useful way) quite as well as you assume you do, and having to do the write-up may force you to do the work you had wrongly supposed you had already done.
I haven’t actually encountered the concept of a literature review as being, per se, an obligatory part of the paper. Rather, there’s an obligatory Introduction that presents the background/context for your work, and much of that will usually be prior work that you cite; but, depending on the exact topic, it may go back just a few years.
Over “quality” in the sense of “innovative” or “breathtakingly groundbreaking”, yes, to some extent – and that’s a good thing. There’s lots of fairly boring work that has to be done before you can build the groundbreaking stuff on it. In my field there have been papers that basically assumed the basic stuff that hasn’t been done could be replaced by a few quick-and-dirty averages, and then built on that; that’s GIGO.
(For example, there was a whole fashion of “matrix-representation supertrees”: instead of making your own big phylogenetic analysis, which takes years if you do it right, take a bunch of published small ones, make a mathematical representation of the trees, and calculate a “supertree” from those, all objective and all. Some people actually tried to believe, and published, that supertrees can show new facts that were previously inaccessible. Eventually the evidence became overwhelming that they just multiply and compound the inadequacies of the datasets that the input trees were derived from.)
But, absolutely, I haven’t thought the whole thing through! I’m not in politics.
Yes, you’re right, but as most hatters are not chemists (or if they are they keep very quiet about) it may be helpful to explain how we know it’s chemically impossible and why arsenic is a favourite weapon of murderers. The problem is that arsenate is similar enough to phosphate to get into places where it shouldn’t be, but not similar enough to do the job when it gets there. The enzyme glyceraldehyde 3-phosphate dehydrogenase catalyses the conversion of glyceraldehyde 3-phosphate to 1,3-bisphosphoglycerate, using inorganic phosphate as one of its substrates. This is an absolutely essential reaction in bacteria, humans, sequoias and all other living organisms, and we cannot live without 1,3-bisphosphoglycerate. The enzyme will accept arsenate as substrate in place of phosphate, but the resulting 1-arseno,3-phosphoglycerate is unstable and decomposes immediately, releasing arsenate. As the arsenate is recycled only catalytic amounts of it are needed to screw up the whole system. Notice that this is, as David said, a chemical problem; it’s not a problem of lacking an enzyme we need.
That applies not just to linguistics but to all disciplines that I’m familiar with. One of the most famous papers in biochemistry was published by L. Michaelis and M. L. Menten in 1913. Nowadays many people seem to know that M. L. Menten was a woman, in part because I and a few others like John Lagnado have gone out of our way to make it common knowledge, but that wasn’t the case 30 years ago, when it was obvious that many of the people who cited the paper had not actually looked at it: if they had they would have known that it was written by “Leonor Michaelis und Miß Maud L. Menten”.
Another very well known paper was published by Hans Lineweaver and Dean Burk in 1934. It so happened that their reference to Michaelis and Menten contained an error (wrong page number, if I remember rightly, but I haven’t checked recently), and guess what: numerous later references to Michaelis and Menten contained the same error.
I didn’t know 1,3-bisphosphoglycerate was the limiting factor; I was told in school it’s polyarsenates that fall apart in water unlike polyphosphates, so if you get too much arsenate into your ATP synthase, you die.
Anyway, GFAJ-1 survives in high arsenate concentrations, even when there’s much more arsenate than phosphate, because it has managed to evolve enzymes that prefer phosphate much more strongly over arsenate than the homologous enzymes of other organisms. Which enzymes exactly I have no idea.
there absolutely are still cases where an institution has its candidate first, then tailors the job ad to that candidate
and the opposite (perhaps more commonly found in the humanities), where either nigh-unbreakable tradition or explicit policies (as at yale, though i think the rules there may have recently changed) forbids tenuring anyone from the ranks. which then sets up a situation where the faculty involved in the decision-making don’t have any direct experience of the candidates unless someone Very Well Known applies, which makes the process more reliant on allegedly impartial quantitative metrics like pages published or numbers of citations.
…or, I suppose, a lot of lengthy interviews that nobody in the faculty has time for.
In which case you get the candidate who’s the best at lengthy interviews.
That would be me. I radiate spurious plausibility.
Oks is apparently one of the teenaged doofuses who put together Mike Gravel’s bathetic 2020 presidential “campaign.”
More “AI” hype from Anthropic:
https://www.washingtonpost.com/technology/2026/04/11/anthropic-christians-claude-morals/
I wonder if these “Christian leaders” quite understand the role they are being gulled into here …
Not only are LLMs going to be proper scientists, they are going to have morality too …
The autocomplete function on my phone was clearly Satanic (but I exorcised it.)
I remember some US missionaries who I knew in Nigeria saying that their office computer was evidently oppressed by a demon, but they were joking. I think …
Clippy was obviously hellspawn, though, now I think of it. One of those petty demons, like Merax or Mullin.
It’s OK to consort with demons if they are cute.
A fellow I know was just a few weeks ago calling for the return of Clippy as a benign-by-comparison alternative to the newer crop of possibly-demonic chatbots. He might have been joking, of course, but I don’t think that’s the only possibility.