Redacción HC
20/06/2024
Large language models (LLMs) like ChatGPT and Gemini have transformed how we interact with machines—offering fluent, natural-sounding answers to almost any question. But behind the smooth talk lies a major problem: hallucinations. These are instances where the AI confidently provides false, misleading, or fabricated information, often indistinguishable from accurate responses.
Now, a group of researchers from the University of Oxford may have found a breakthrough tool to flag these hallucinations automatically. Their recent paper, Detecting hallucinations in large language models using semantic entropy, published in Nature (June 2024), introduces a statistical method that measures semantic uncertainty in AI-generated responses—without requiring model internals or retraining.
AI-generated misinformation isn't just an academic curiosity—it has real-world consequences. Whether in medical diagnosis, legal advice, or education, an incorrect AI response can mislead users and cause harm.
Existing solutions often require extensive retraining, human-in-the-loop validation, or access to the model's internal probabilities—none of which are practical for real-time applications or closed systems like GPT-4.
This is where semantic entropy comes in. Instead of relying on what the model "thinks," it focuses on how consistent its responses are across multiple trials of the same question.
Semantic entropy is a measure of how much the meaning of multiple AI responses to the same prompt varies. Here's how it works:
This is a model-agnostic, input-only method—it treats the AI like a black box and only examines the outputs, making it compatible with commercial systems.
The Oxford team tested semantic entropy across several datasets, including question-and-answer benchmarks and fact-checking tasks. The results were compelling:
Crucially, the method does not require retraining or special tuning for different domains, making it highly scalable and robust.
The potential applications of semantic entropy are wide-ranging:
By flagging uncertain responses, systems can warn users, block output, or prompt fact-checking—an essential safety layer in critical domains like telemedicine, journalism, or legal tech.
Semantic entropy works without access to internal probabilities, making it suitable for proprietary LLMs and APIs where model introspection is restricted.
When models provide a confidence signal based on semantic coherence, users can better gauge the reliability of responses—reducing blind trust and potential misuse.
As AI systems grow in capability and reach, this method offers a transparent way to monitor and mitigate hallucinations, aligning with ethical guidelines and policy recommendations for trustworthy AI.
While powerful, the method is not without trade-offs:
To address these, the authors propose:
"Semantic entropy offers a reliable, scalable way to spot hallucinations—without needing to open the model's black box," says lead author Sebastian Farquhar.
As generative AI becomes more embedded in society, methods like this are essential to ensure that powerful systems don't just sound smart, but also stay honest.
Topics of interest
TechnologyReferencia: Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature [Internet]. 2024;630(8017):625–630. Available on: https://doi.org/10.1038/s41586-024-07421-0.