As anyone who has been following LH for any length of time will be aware, I am no fan of “AI,” but this seems like a situation in which large language models could be of great use; the Austrian Academy of Sciences reports:
The Austrian Academy of Sciences (OeAW) is collaborating with Mistral AI and Sail Reply, a Reply Group Company, on the development of a Large Language Model (LLM) for Ancient Greek: Apollo, named after the Greek god of light and patron of the arts and sciences, will propel research on ancient Greek texts. The model supports advanced searching and automatic text restoration in hundreds of thousands of undeciphered papyri and inscriptions, making it possible to accurately capture content in a matter of hours rather than years. The OeAW and its partners are doing pioneering work, as LLMs have not yet been developed for a historical language evolving over many centuries or the reconstruction of heavily damaged ancient texts.
On behalf of the OeAW, the project is led by Anna Dolganov, an ancient historian and papyrologist at the Austrian Archaeological Institute of the OeAW, who provides field–specific guidance, oversees the integration of relevant sources, and guarantees scientific quality. Through her expertise, Dolganov ensures that historical contextualization and methodological standards are upheld. […]
Anna Dolganov: “Our project with Mistral AI and Sail Reply is building the world’s first advanced multimodal Large Language Model for an ancient language, trained on the largest digital corpus of historical Greek to date. This AI system can be developed in many directions for a wide range of research tasks, from reconstructing fragmentary inscriptions and papyri to conducting semantic and thematic searches across the entire Greek textual tradition to deciphering handwritten texts. For example: there are one million Greek papyri worldwide that have never been read, tens of thousands of which are held by the Papyrus Collection of the Austrian National Library. Such treasures of historical knowledge are our target. This LLM marks the beginning of an exciting journey in the study of antiquity.”
I didn’t realize there were so many unread papyri — if this works as advertised, it could be a boon. Thanks, Martin!
In editions of fragmentary papyri, it is important to distinguish what can be read and what is restored by conjecture. I hope the AI publications, labeled as such, will be clear about which is which. If so, it will be helpful.
She doesn’t look ancient to me, judging by her biog photo. B.A. in Classical Philology, Harvard University (magna cum laude, 2005)
How did these tens of thousands of papyri end up in Austria?
@SG even if the AI gives no better than a fair guess, that’ll at least make a searchable resource for humans to go and eyeball a specific text amongst the tens of thousands.
How did these tens of thousands of papyri end up in Austria?
Probably excavated and brought home by Austrian (or maybe even Austro-Hungarian) archaeologists? Or bought frim local antique traders for Austrian musums? Both practices became frowned upon only at some point in the second half of the last century.
https://de.wikipedia.org/wiki/Papyrussammlung_und_Papyrusmuseum_Wien
—
Die Sammlung verdankt ihr Entstehen in erster Linie dem Professor für Geschichte des Orients an der Universität Wien, Josef Karabacek. Mit Hilfe des Teppich- und Kunsthändlers Theodor Graf konnte dieser in den Jahren 1881 und 1882 die rund 10.000 Papyri des 1. Fayyumer Funds nach Wien bringen…. Bis 1899 erweiterte Erzherzog Rainer die Sammlung laufend durch neue Ankäufe und machte sie im selben Jahr seinem Onkel Kaiser Franz Joseph I. zum Geschenk, der sie als Spezialsammlung in die k.k. Hofbibliothek (die heutige Österreichische Nationalbibliothek) eingliederte. Sie erhielt neue Räumlichkeiten am Josefsplatz. Zusätzliche Ankäufe von hauptsächlich Ostraka erfolgten 1899 und 1911.
—
To paraphrase, an Austrian professor of Oriental History at Vienna University brought 10000 Papyri to Vienna in 1881-82. New purchases were added to the collection until 1899 by Archduke Rainer [he had bought the collection in 1883], who then gave it to his uncle the Emperor, who had it housed as a special collection in the imperial court library (now called Austrian National Library). Additional purchases (mostly of ostraka) were made in 1899 and 1911.
The corresponding English Wiki article is shorter and says less about when and how the papyri were acquired.
How did these tens of thousands of papyri end up in Austria?
Probably excavated and brought home by Austrian (or maybe even Austro-Hungarian) archaeologists? Or bought frim local antique traders for Austrian musums? Both practices became frowned upon only at some point in the second half of the last century.
or… as Mr Kipling might have had it, “Take up the brownish man’s property.”
“with the help of” a “carpet and art dealer”. Yay, free market.
In editions of fragmentary papyri, it is important to distinguish what can be read and what is restored by conjecture. I hope the AI publications, labeled as such, will be clear about which is which.
Hopefully the AI will automatically distinguish them via the existing notation.
Hopefully the AI will automatically distinguish them [what can be read and what is restored by conjecture] via the existing notation.
Hopefully the AI will automatically and honestly say whether it is doing that. In my limited experience, however, AI will fawn and attempt to ingratiate itself, or simply lie, when it encounters ambiguity.
The fawning and mendacity are already in the source materials, of course, so no intelligence or creativity is needed to bring them to light.
Maybe they’ll have trained it better than that.
Do LLMs distinguish between certainty and conjecture? That may have to come in when the human is in the loop, as AntC suggested.
As already suggested, we await whether a specific AI edition will include indication or discussion of whether a letter is only partially visible, and tendencies if known of that scribe, and whether a proposed reconstruction relies on, say, a formulaic legal phrase known from the same time and location, and paleographic dating, and accounting for known provenance and/or provenience, and so on. An experiment.
Here’s hoping that you feel comfortable in doing away with the disclaimer in the future. Knee-jerk Luddism (or appeasement of such types) is just as harmful as knee-jerk Kool-Aid drinking, and it doesn’t take much to examine individual uses of LLMs or other such technologies on a case-by-case basis with a level head.
If it were the case, Yuval, that you were addressing me, about which I may be mistaken, I would have a twofold response:
Show me an AI-assisted example
and
Why hope to do “away with with the disclaimer”?
Maybe they’ll have trained it better than that.
Do LLMs distinguish between certainty and conjecture?
they not only do not, they cannot.
“training” just means feeding in more human-coded material for pattern extrapolation, nothing more – and often less, when in practice it means feeding in LLM-coded material instead.
in theory, an LLM could label a mark as (say) “conjecturally a mem”*. but it will only apply that label to marks that resemble the ones that are labeled that way in its feedstock. so the possibilities abound for both false positives (a mark that’s clearly an unusually compressed ayin being labeled as a mem, say) and false negatives (a definite mem with an unusually long rightmost stroke being labeled as a nun with a squiggly tail, say). and novel marks (i don’t think i’ve ever read a yiddish letter without finding at least one) will be labeled by their similarity to already-encoded ones, which is laughable even just thinking about the langer mems i have encountered, let alone the langer feys or even lameds.
and that’s not even getting into, to take an example from a family letter i’m translating for a friend, the fact that for a particular writer the same two short vertical marks can be an alef, a tsvey-vovn, a vov-yud pair, a yud-vov pair, or a tsvey-yudn. for that writer, there’s also the added twist that she uses those marks in places that in almost any standardized system would call for a single vov, a melupm-vov, a single yud, a yud and a shtumer alef, a vov and a shtumer alef, or a pasekh-tsvey-yudn. which also highlights the limited utility of using longer sequences of marks to help: her “di” (article), “du” (pronoun), “tsi” (“whether”, “or”), and “tsu” (preposition) are often realized with the same marks (and by extrapolation, perhaps also “tsi” (verb stem)). it has been a fun** ride; i do not think it is an automatable one.
there’s a reason why the largest recent manuscript-deciphering project in the yiddish world (the first stage of the KMDMP), which is also probably the most important yiddish digital humanities project around, relied entirely on multilingual groups of human decipherers.
and all of this is with a living language written in a known and living script by comfortably literate writers, using high-quality scans of original sheets that are in quite good condition.
.
* i’m using this as my example because the first lesson of deciphering yiddish handwriting is “if it’s just a blob, it’s a mem” – and you quickly learn that there is a nearly infinite variety of blobs that are mems, that lots of yiddish manuscripts have blobs that aren’t mems (or even letters at all), and that lots of other kinds of enigmatic marks can be mems but can also be other letters.
** in part because it’s teaching me how thoroughly it’s possible for a person’s writing to conceal their spoken lect – the ambiguities i’ve mentioned (plus never using nekudes at all) mean that the key vowel differences aren’t legible, or aren’t readily identifiable if they are indicated in some way.
If it were the case, Yuval, that you were addressing me
No, Yuval was addressing me (see the beginning of my post). I am unrepentant.
@rozele: Machine learning algorithms that are not language models can actually be designed to be much better at quantifying uncertainty about their responses to inputs than the LLMs. So I don’t think that identification of letter forms is necessarily going to be plagued by the kind of overconfidence we see in ChatGPT.
i think “overconfidence” unjustifiably attributes agency to something that would accurately be called “inaccuracy” (overconfidence is what the hucksters want us to have about the inaccurate results), but i do take your point about the difference in flavors of software!
On the subjects of the LLM explaining its reasoning and stating uncertainties, does anyone know whether this is a place where the LLMs known as reasoning models can help.
@rozele: AIUI, LLM chatbots are sycophantic—sorry, they produce sycophantic texts—because their trainers and users give good ratings to such texts. I’m suggesting that if you start your model with clean silicon and give the trainers careful instructions, maybe you can avoid that.
I don’t think we even know whether the OeAW’s LLM will be able to chat. Maybe it will just produce ancient Greek texts, with or without paleographic symbols and other annotations, as Stephen Goranson and Rodger C may be expecting.
But I trust that any “editions” it produces will be accompanied by the digital images they’re based on and information about provenience (a word I just learned), so users can decide for themselves whether any letters are only partially readable, etc.
But I trust that any “editions” it produces will be accompanied by the digital images they’re based on and information about provenience (a word I just learned), so users can decide for themselves whether any letters are only partially readable, etc.
Exactly. Anytime the machine output produces something interesting, the first step will be to look at the original document to verify that it does indeed say what the machine claims that it says.
The real risk from the inaccuracy of the machine is not that it will produce interesting content which is inaccurate, but rather that it will fail to correctly recognize interesting content, so the researchers will never find it. But they’re not finding it now, because they don’t have the time to look for it. So the new technology should definitely provide a massive net benefit.
But they’re not finding it now, because they don’t have the time to look for it. So the new technology should definitely provide a massive net benefit.
That’s my take. It’s not a matter of trusting the AI (which no one should ever do), it’s a matter of letting it do some gruntwork to free up human labor. And I certainly hope nobody involved expects the software to “chat.”
Why wouldn’t it be able to chat? I’d think users would like to be able to get answers to questions such as “Why did you conjecture this instead of that?” and “Was this formulaic legal phrase [as Stephen Goranson mentioned] used at this time and place?” as long as the answers actually reflected the software’s data and process.
get answers to questions such as “Why did you conjecture this instead of that?”
that is exactly the thing that LLMs (et al) cannot do. when those kinds of questions are the input, the output is – like all of their output – based on what is statistically likely to follow such an input in their feedstock – it doesn’t have a damn thing to do with “why”s. this software doesn’t “answer questions” for any sense of the phrase beyond “produces an output when you give it an input” (which is true of haruspicy, too – and any resonably competent haruspex can explain why when the input is a knife, the output will be intestines).
the problem with the second question should be obvious from the (remarkably consistent) results when u.s. government lawyers ask LLMs to provide them with precedents to include in briefs.
Maybe it will just produce ancient Greek texts, with or without paleographic symbols and other annotations
i am extremely skeptical about the human cross-checking of such results, even in the very narrow best case you’re describing. we know a lot about how deeply humans’ assessments of ambiguous images are shaped by their expectations for what they’ll see. and it’s hard to imagine a more compelling form of expectation than the combination of promised infallibility and elliptically-stated “if you want funding, you’d better show us you’re using the software our major donors have bet billions on” threats.
(i’m basing that last point partly on recent conversations with a generally very sharp and critical retired city lawyer who cannot seem to take in that LLMs are pattern-extrapolators, not search engines, despite reading the coverage of fake case citations, and a renowned philosopher of consciousness (the family friend who’s the reason i got to play with ELIZA in the early/mid-1980s) who has been using high-end LLMs to summarize papers rather than reading them.)
God, that’s depressing.
@rozele: that is exactly the thing that LLMs (et al) cannot do. when those kinds of questions are the input, the output is – like all of their output – based on what is statistically likely to follow such an input in their feedstock – it doesn’t have a damn thing to do with “why”s.
Please see the “reasoning models” I mentioned above. That’s what they’re doing, but they’re applying it step-by-step to complex problems, and they report the intermediate steps.
And it seems to me to be a reasonable way for software to work. Suppose there’s a mark on a papyrus, and it’s statistically likely in edited Greek texts that similar marks are interpreted as sigmas, omicrons, or somewhat defaced rhos. (I’m just guessing about what letters are likely to be confused. People can substitute others if I guessed wrong.) Suppose further that given the preceding and following letters, an omicron gives a word and neither a rho nor a sigma does. It will be statistically likely that the training data will contain that sequence of letters with the omicron, so the software will come up with that word.
This is a narrow domain, and the trainers and users will be specialists to some extent. Not, of course, infallible. I think there’s reason to hope for more useful results than the general-purpose chatbots give.
i am extremely skeptical about the human cross-checking of such results, even in the very narrow best case you’re describing.
Many of us seem to be imagining different things. What you call the “very narrow best case” is what I’d expect to be the typical case. What other cases do you expect?
I trust no one will claim the OeAW’s LLM will be infallible. I too find your stories of credulous people who should know better depressing. But soon enough, they’ll learn painful lessons. And other people will start from a very skeptical position.
@rozele:
You and your lawyer friend (or the LLMs used) may be a bit behind the curve. The LLMs that we currently use at my work provide sources for their assertions, which are actually existing (not “hallucinations”) and checkable. Checking that this is the case is of course a step that you need to do, but these LLMs now are actually useful for finding real information.
Just as a small trial, here’s Claude on this handwritten Yiddish document, in response to this prompt: “Can you transcribe this manuscript and indicate points of uncertainty and degrees of uncertainty in the transcription?”. Apologies in advance if formatting/characters go wrong. I have no Yiddish and no idea if this performance is accurate and/or is the kind of thing that would be useful to scholars in the field. I’d be interested to hear from those who are able to assess it.
Interesting. It can certainly talk plausibly about uncertainties. Funny that it transcribes “November” but doesn’t “realize” it [edit: till later in the answer].
What might it mean by “diplomatic”?
I can barely recognize a single letter of the manuscript, but based on my less than rudimentary Yiddish, I have the gravest possible doubts about Claude’s transcription after the date. We’ll see what rozele and anyone else who knows Yiddish says*. However, I’m prepared to whole-heartedly support the recommendation to find a human reader.
*ETA: Should that be “say”? Usually I’m big on plural verbs with compound subjects, but the “knows” may be throwing me off. I think I should change it to “Looking forward to comments by rozele and anyone else who knows Yiddish.”
Two more examples, with different degrees of uncertainty, from Claude again. First this.
And this:
“Diplomatic transcription” is a technical term; I don’t know what it means, though.
Or do they make them up afterwards, basing the choice of words on what’s likely to be in such a report?
The header line of the Yiddish ms. begins with טאראטא ‘Toro[n]to’, if that helps, for a start.
The AI produced somewhere between 99% and 100% gibberish, now New and Improved with gibberish error bars, by request.
A diplomatic transcription means one that aims to reproduce every aspect of the written manuscript, for example line breaks, cross-outs, insertions, etc. It’s a variable concept, depending on how closely the transcription follows the manuscript.
A (non-diplomatic) transcription of the Old English above:
Full text here, lines 35-45.
Thank you, Hans, for the report of experience, and Y, for information including “diplomatic”.
Today’s SMBC. It all makes sense now!
What I want to know is where they got “600 million words from historical Greek texts.” There is a similar project done in Princenton called Logion (https://huggingface.co/princeton-logion/logion-bert-base) which only aims to provide a tool that guesses the next word. That one was trained on some 50 million words which is supposed to be the entirety of TLG.