David Oks’s essay on citations is not central to my interests, but I know there are lots of Hatters who do science and will probably have things to say; I myself found it extremely enlightening. I’ll quote the start and let you click through for the rest:
Here are a few headlines from the world of science. […] So scientists are submitting AI-generated papers; reviewers are using AI to assess them; obviously some amount of low-quality AI-generated content will end up getting approved and published. Well-regarded journals have been caught publishing papers with classic ChatGPT-isms like “here is a possible introduction for your topic” or “as of my last knowledge update.” But that’s not all. Many of those AI-generated papers are being cited by articles in other peer-reviewed journals: and many of those articles, unsurprisingly, appear to be AI-generated themselves.
It’s pretty well-known now that science is “drowning in AI slop.” In that regard, it’s not alone: AI slop is steadily infiltrating every school and workplace in the country. But there’s something about all of this that puzzles me.
I get why students, for example, would want to avoid doing homework. But I don’t really understand why scientists would want to avoid doing science. Or, rather, why they’re so eager to use AI to produce a huge number of shoddy papers. No one forced them to become scientists. I imagine that most people who work as scientists chose to do so out of something like love for the subject. So why are scientists using AI to produce and submit so much garbage?
I don’t think that the answer actually has much to do with AI. It has to do, instead, with the incentives that govern scientific institutions. You could boil it down to one word: citations.
Over the last few decades, science has undergone a “citation revolution.” Scientific life used to be structured by personal reputation and mutual acquaintance; now it is defined by quantitative assessments derived from citations.
And this reward system has warped scientific life in dramatic ways. It has resulted in the obvious and widespread gaming of citation metrics; but, more insidiously, it has pushed scientists toward risk-averse, incremental, and above all unambitious research. The logic of institutional science has become increasingly divorced from actual knowledge and discovery. In a system governed by these perverse incentives, the inevitable endpoint is simply AI-generated slop at scale. […]
But we should start, first of all, with a moment very much like our own, the origins of the citation revolution: the “information crisis” of the 1960s.
He talks about the idea of precedent, and says:
In the 1870s, a salesman of legal books named Frank Shepard realized that this represented a good business opportunity. Lawyers always needed to trace the subsequent history of a ruling. So Shepard started producing books with gummed strips of paper, listing every subsequent case that cited a given decision. With Shepard’s books—called Shepard’s Citations—you could quickly learn whether a given case was still good law. Shepard’s innovation was tremendously successful. It did so well, in fact, that his name became a verb: “to Shepardize” meant to consult Shepard’s Citations to check on the status of a precedent.
In 1953, long after Shepard had died, a retired vice president of Shepard’s Citations named William C. Adair, living at his ranch in Colorado Springs, was reading a newspaper article about scientific documentation. Science, the article said, was “swamped in a sea of literature,” and a group of researchers at Johns Hopkins wanted to see how machine methods could fix that.
Adair’s curiosity was piqued. The answer seemed obvious. Why not just apply citation indexing to science?
There’s lots more, and I knew nothing about any of it.
Just: no. There are no “AI scientists”, and we are not heading to a world with any, let alone “millions.” And they wil not “eventually own such integrally ‘human’ parts of the scientific process as hypothesis generation.” This is just swallowing the marketing hype.
The article is correct that “AI” is (as in other areas) exposing serious defects in the existing knowledge ecosystem; as well, of course, as grievously polluting it further.
“Bibliometrics” is what suits prefer to actual judgment of quality, and its proliferation is due to politically driven changes by which university education has been reframed as mere technical education, worthy of state support only as far as it can be shown to lead to more or less immediate financial benefits.
Nature shows some chutzpah in publishing a paper complaining about this pollution of the noosphere. As a publisher, they’re enabling it. For money.
There are no “AI scientists”
Yes, that was poorly expressed, but I don’t think it’s central to his point.
No, he means it, as his further remarks clearly show. It’s not just a clumsy statement. He really does think that LLMs will (eventually) be able to do real science, not just imitate academic papers well enough to get published in by Nature. This is pure fantasy.
“AI” merely exposes the existing perverse incentives here; about those, I do largely agree with Oks: though his focus on “citations”, specifically, as the problem strikes me as peculiar. There’s nothing wrong with citations: it’s the whole political environment that academia is having to function in that’s the problem.
Oks seems to be a political activist (on the side of the angels) rather than someone with any particularly relevant scientific experience.
https://en.wikipedia.org/wiki/David_Oks
Ah, I didn’t check on his background — thanks.
That said, what I found interesting was not the AI catastrophizing (which there’s no shortage of) but the history of citations and how they migrated from law to science.
Ah, the legacy of Frank Shepard! I was in the very last cohort of American lawyers (the generation admitted to the bar in the early/mid Nineties) who had learned how to Shepardize cases the old-fashioned way with hard-copy reference volumes, but soon enough most of the Powers That Be were sufficiently convinced that the process had been automated by online software to a sufficient degree of reliability that they became eager to stop paying for constantly updated sets of hard-copy reference volumes – along with the more general sense that your office rent could be spent more productively if fewer square feet were devoted to an analog law library with shelves full of hard-copy volumes.
I personally don’t think the way in which earlier judicial decisions (in an Anglo-American system) relate to more recent judicial decisions is a particularly compelling analogy for the way earlier scientific papers relate to more recent ones, but no one in the science biz asked me about that.
I had heard of “shepardizing” a long time ago and have forgotten about it until now. I never knew before what it meant.