Citations Make AI Hallucinate More, Not Less — A Major New Study

Citations Make AI Hallucinate More, Not Less — A Major New Study

You’d think that adding a citation to an AI’s answer would make it more trustworthy. A link to a scientific paper. A reference to a legal precedent. A footnote pointing to a medical journal. Surely the AI has checked its sources, right?

Wrong. A landmark new study accepted at ICML 2026 — one of the world’s most prestigious AI conferences — reveals something deeply counterintuitive: citations make large language models more likely to hallucinate, not less.

The Experiment That Flipped the Script

Researchers from IIIT-Delhi built something called AuthorityBench — a massive benchmark containing 220,564 carefully crafted prompts. Their goal was deceptively simple: figure out whether the mere presence of a citation changes how an AI model behaves, independent of whether the underlying facts are actually true.

To do this, they constructed a 2×2 factorial design. Every prompt they tested fell into one of four categories: a true claim paired with a real citation, a true claim with a fabricated citation, a false claim with a real citation, or a false claim with a fabricated citation. This design is crucial because it isolates the effect of the citation itself from the effect of the factual content. No previous study had done this at scale.

They tested these prompts across four domains — general knowledge, science, law, and medicine — using more than forty prompt templates to make sure their results weren’t an artifact of how the question was phrased. They varied venue prestige, testing whether a citation from Nature mattered more than one from an obscure journal. They even built a dataset of author names coded by country of origin, to see whether demographics influenced the effect. And then they ran seven different large language models through all of it.

What They Found: Citations as a Hallucination Amplifier

The results are unsettling. Across every model tested, the presence of a citation — whether real or completely made up — increased hallucination rates by 3 to 22 percentage points compared to answers given without any citation at all. Think about what that means. When an AI appends a scholarly-looking reference to its answer, it doesn’t just fail to improve accuracy. It actively makes the answer worse. The citation acts less like a fact-check and more like a confidence boost for whatever the model was already going to say, true or false.

The most alarming pattern emerged in the general knowledge domain. When a claim was actually true but accompanied by a fabricated citation, hallucination rates shot up to between 35% and 77%. That’s right — the model was more likely to get things wrong when the citation looked credible but wasn’t. It’s as if the model saw the citation and thought, “Well, if someone’s citing this, it must be right,” and then stopped being careful about the actual facts. The researchers call this “epistemic susceptibility” — the model’s vulnerability to being misled by signals of authority rather than by evidence.

Legal claims proved notably more robust. When prompts involved legal questions, the models were less swayed by citation presence or absence. The authors suggest this may be because legal training data contains explicit reasoning chains — “according to section X of statute Y” — that train the model to handle citations more carefully. But in general knowledge, science, and medicine, the models behaved more like credulous undergraduates who assume the footnote means the professor did the work.

Venue Prestige Doesn’t Matter. Neither Do Author Names.

Perhaps the most humbling finding is what didn’t matter. The researchers expected that citations from top journals — Nature, Science, The New England Journal of Medicine — would carry more weight than citations from low-tier venues. They didn’t. Prestige had a negligible impact on how the models responded. Similarly, varying author names by apparent country of origin made almost no difference. The models weren’t discriminating between prestigious and obscure sources, nor were they showing demographic bias in how they weighed citations. Instead, they were responding to the raw signal of “there is a citation here” — a binary toggle that reliably increased hallucination regardless of what that citation actually was.

This is both good news and bad news. The good news is that, at least on this metric, the models don’t appear to be encoding the kind of prestige bias or national-origin bias that plagues human academic evaluation. The bad news is that the mechanism driving the hallucination effect is even more primitive than researchers feared. It’s not sophisticated bias — it’s something closer to a reflex.

Why This Matters Right Now

The implications go far beyond academic curiosity. We are in the middle of a massive rollout of citation-augmented AI systems. Google’s AI Overviews cite web pages. Perplexity and You.com ground their answers in search results. ChatGPT’s browsing mode links to sources. Claude cites documents. Every major AI product is pushing in the direction of “look, here’s the evidence” — under the assumption that citations make the system more reliable.

AuthorityBench suggests that assumption is at best incomplete and at worst backwards. If citations increase hallucination rates, then every citation-augmented answer you read from an AI today is — statistically — less reliable than the same answer would be without the citation. The very feature designed to build trust may be undermining it.

The mechanism likely works through what the researchers describe as a kind of epistemic overconfidence. When a model is asked to answer with a citation, its generation process shifts toward producing an answer that “looks cited” — one that matches the stylistic patterns of citation-bearing text in its training data. In that shift, factual accuracy gets deprioritized, because the training objective was never “be correct when citing” — it was “predict the next token.” The citation format itself becomes a cue for a certain kind of prose, and that prose isn’t necessarily true.

What This Means For You

When you use an AI tool that cites its sources, AuthorityBench suggests you should treat those citations as a yellow flag rather than a green one. The presence of a citation doesn’t mean the AI checked anything. It means the AI produced text that includes a citation — which, as this research shows, actually correlates with lower factual reliability.

This doesn’t mean citations are useless. In the legal domain, they genuinely helped. And as models improve their reasoning capabilities, citation handling may improve too. But for now, if you’re using an AI for medical advice, scientific information, or general knowledge questions, the safest approach is to ignore the citations entirely and verify the core claims independently. Use the AI to point you toward topics to research, not as a source you can trust just because it looks scholarly.

The Bigger Picture

AuthorityBench is part of a growing body of research that documents what you might call “epistemic fragility” in large language models. These systems are extraordinarily capable at producing text that looks authoritative. They are much less capable at distinguishing between genuine authority and its stylistic imitation. As AI becomes embedded in search, education, medicine, and law, understanding this gap — and communicating it clearly to users — becomes not just a research problem but a public health and public trust problem.

The researchers have released all their datasets and evaluation code publicly, which should help other teams build on these findings. The next frontier is clear: figuring out how to train models that treat citations as something to verify, not something to mimic.


📄 Original paper: arXiv:2606.13104 — “Authority, Truth, and Citation Bias” by Khurana, Ramana & Kumar (2026), accepted at ICML 2026