Part 36 of 58
The Confident Liar
By Madhav Kaushish · Ages 12+
The pre-trained and fine-tuned language model was impressive. Hjentova used it to answer questions about her archive, and the answers were usually accurate and well-formed. Then she asked about the Battle of Kronthjel Pass.
The Invented History
Hjentova: Tell me about the Battle of Kronthjel Pass in the Third Dynasty.
The model produced a detailed response: "The Battle of Kronthjel Pass took place in the fourteenth year of Chieftain Gromvald's reign, when forces from the northern provinces attempted to seize the mountain trade routes. The defending garrison, led by Commander Thrysska, held the pass for eleven days despite being outnumbered three to one. The battle ended when reinforcements arrived from the coastal city of Plonvek, forcing the northern forces to retreat through the Hrinjak Valley."
Fluent. Detailed. Authoritative. And completely false.
Hjentova: There was no Battle of Kronthjel Pass. There was no Commander Thrysska. There was no siege of eleven days. Gromvald did reign during the Third Dynasty, and Kronthjel Pass is a real place, and there were northern border conflicts — but this specific battle never happened. Your model invented it.
Trviksha: Are you certain?
Hjentova: I am the chief archivist of the royal library. I am certain. There are three recorded battles at Kronthjel Pass, all in the Fifth Dynasty, none matching this description. Your model has fabricated a plausible but entirely fictional historical event, complete with invented names and invented details.

Why It Lies
Trviksha examined what had happened. The model had been trained to predict the next word. When given the prompt about the Battle of Kronthjel Pass, it generated the most probable continuation — the sequence of tokens that was most consistent with the patterns in its training data.
The training data contained many descriptions of battles. They followed a pattern: a location, a chieftain's reign, opposing forces, a duration, a resolution. The model had learned this pattern and generated a perfectly patterned battle description. It had also picked up real names (Gromvald, Plonvek) and plausible-sounding invented names (Thrysska, Hrinjak) from the text.
Trviksha: The model does not know whether the Battle of Kronthjel Pass happened. It knows what a description of a battle looks like. When asked about a battle, it generates something that looks like a battle description. It fills in the template with plausible details.
Blortz: It learned the form of historical text — how battle descriptions sound — without learning the content — which battles actually occurred. The form and the content are different things, and the model is much better at form.
Trviksha: The prediction task rewards generating plausible text. "Plausible" and "true" overlap heavily — most plausible continuations are also true, because the training data is mostly factual. But when the model encounters a prompt about something that is not in the training data, it generates a plausible continuation anyway. And plausible-but-false is indistinguishable from plausible-and-true in the model's output.
Hjentova: It writes fiction with the confidence of history.
Trviksha: It does not know the difference. From the model's perspective, there is no difference. Every output is the continuation that maximises the prediction score. Whether the output corresponds to reality is outside the model's awareness.
The Scope of the Problem
Hjentova: How often does this happen?
Trviksha: For common topics — events well-represented in the training data — rarely. The model has seen many descriptions of the actual Third Dynasty battles, so it tends to reproduce those accurately. For rare topics, obscure events, or questions that require combining facts in new ways, it happens frequently.
Hjentova: So I cannot trust the model's answers unless I can verify them independently.
Trviksha: Correct. The model produces plausible text. Verifying truth requires checking the text against the actual archive — which is the task you wanted the model to help with in the first place.
Glagalbagal: A tool that requires the expertise of the person using it. If Hjentova knows the history well enough to verify the model's answers, she may not need the model's answers.
Trviksha: That is overly harsh. The model is accurate on the vast majority of queries — it retrieves real information from the patterns it has learned. The problem is that you cannot tell which answers are the majority and which are the fabrications. They look identical.
This was, Trviksha realised, a fundamental feature of the prediction-based approach, not a bug to be fixed. The model learned to produce text that was maximally consistent with the patterns in its training data. On common topics, this consistency aligned with truth. On rare or novel topics, it aligned with plausibility — which was not the same thing.