MIRAGE | Mohammad Asadi

Results figure from MIRAGE: The Illusion of Visual Understanding

MIRAGE: The Illusion of Visual Understanding
Mohammad Asadi^*, Jack W. O’Sullivan^*, Fang Cao, Tahoura Nedaee, Kamyar Rajabalifardi, Fei-Fei Li^†, Ehsan Adeli^†, Euan Ashley^† · arXiv, 2026
* Equal contribution · † Co-supervisors

arXiv · Code · In the press

TL;DR

MIRAGE is a 2026 study from Stanford, first-authored by Mohammad Asadi, showing that frontier vision-language models (GPT-5, Gemini 3 Pro, Claude Opus 4.5) confidently describe and “diagnose” medical images they were never shown. We call this mirage reasoning. The key numbers:

Models produced confident descriptions of visual details more than 60% of the time on average when no image was provided, and 90 to 100% of the time under certain prompting.
Models retained roughly 70 to 80% of their original benchmark accuracy with no images at all.
In the most extreme case, a 3-billion-parameter, text-only model reached the top of a chest X-ray benchmark with no access to any images, beating every frontier multimodal model and surpassing human radiologists by more than 10% on average.

Why it matters

A confident answer is not evidence that a model actually saw anything. In “agentic” medical AI, a small model’s mirage can propagate through an entire pipeline and surface alarming false positives, exactly where trust matters most. As Asadi told Live Science:

“Even if your AI is describing a very, very specific thing that you would say, ‘Oh, there’s no way you could make that up,’ yeah, they could make that up. They could make very rare, very specific things up.”

What we introduced

Mirage Score: how much benchmark accuracy survives when the image is removed.
Phantom-0: a 200-question benchmark across 20 categories for measuring mirage reasoning.
B-Clean: a decontamination framework that reveals how much benchmark performance was never actually visual.

In the news

MIRAGE was covered by Fortune, Live Science (interview), and Futurism (interview), among other outlets, in several languages. See coverage →