AI Hallucinations Are Becoming Rarer – But Human Supervision Still Matters

What do a watchmaker and a doctor have in common? Both are expected to be perfectly accurate. That phrase, which now seems from another era, could today apply to what one expects from artificial intelligence.

In 2023, lawyer Steve Schwartz was involved in one of the most famous AI hallucination incidents. As the attorney for Roberto Mata, who was suing an airline over a flight issue, he submitted a declaration citing non-existent case law as well as real cases with errors. When the Manhattan court pointed this out, his defense was that he had drafted the brief using ChatGPT.

The mistake became a global headline and a cautionary tale. Here’s the truth: hallucinations haven’t gone away, but they’re happening less often and new safeguards and symbolic tooling are making AI more reliable than ever.

The lesson of this incident is clear: we shouldn’t fear AI hallucinations, but we need to understand them, minimize them , and then make sure humans remain in the loop.

Why hallucinations happen

AI has a margin of error. Accepting this starting point is key to understanding the benefits – and the risks – when interacting with a chat or developing AI software. We cannot bury our heads in the sand.

In September 2024, a group of researchers published an article in Nature analyzing 243 cases of distorted information generated by ChatGPT hallucinations. They classified the errors into seven main categories for the public, for organizations, and to improve new AI versions.

They concluded that hallucinations could arise from data overfitting (i.e. analyzing it too literally to interpret it correctly), logical errors, reasoning mistakes, mathematical errors, unfounded inventions, factual errors, or text output issues.

These errors are possible because AI doesn’t “know” facts in the human sense; it predicts patterns. That’s why outputs can look perfectly credible, yet be incorrect.

Still, it’s important to keep perspective: with more than 700 million people using these systems weekly, such errors remain the exception, not the rule.

Human supervision serves as a quality guarantee

We can understand the problem of hallucinations and take advantage of increasingly superior models, but AI adoption will not succeed without human oversight. Even the most advanced agents need it. The human-in-the-loop model is the new workplace dynamic.

Anyone who thought they could stop using their brain and let AI do everything will be disappointed.

To deal with hallucinations, we must ask whether a response could fall into one of the error categories previously mentioned and verify the data sources.

Imagine this process within a company. An AI agent records an interview with a client about their requirements. Then, it can create a story map, develop software, present an MVP, and conduct tests. All of these tasks, which previously took weeks, can now happen in hours. However, the final word belongs to a person. For example, if at any stage the AI hallucinated or the collected data was insufficient, someone must serve as the quality guarantor.

For leaders, this oversight cannot be left vague – it must be operationalized. Human supervision of AI agents should include, for example:

● Tiered review checkpoints where humans validate AI outputs – such as data summaries, client-facing drafts, or generated code – before results move forward. This prevents flawed outputs from compounding.

● Red flag protocols that train teams to recognize and escalate risky AI responses, whether fabricated citations, misapplied logic, or outdated references. Standardized checklists reduce the chance of blind acceptance.

● Pairing AI with domain experts for high-stakes contexts like financial forecasts, healthcare insights, or legal contracts. Experts can spot nuanced hallucinations that algorithms miss.

● Continuous auditing of models through sampling and back-testing AI outputs against ground truth data, treating oversight as ongoing quality assurance rather than a one-off step.

AI literacy is already one of the most in-demand skills in the job market, according to the World Economic Forum. Technology can increase productivity, allowing a company with the same number of employees to serve twice as many clients, but critical thinking – and structured supervision – remains the verification that it works.

AI models are becoming more accurate

In February 2025, Sam Altman announced that the ChatGPT 4-5 model had halved the probability of hallucinations. In other words, it would be unlikely for a Schwartz-like case to happen again.

Gemini, DeepSeek, and Grok have also improved their training data architectures. Each model has its own comparative advantages, but in the ranking that evaluates intelligence, the Massive Multitask Language Understanding (MMLU), seven of them already have a response success rate of 80% or higher.

One of the biggest drivers of this trend is Context Engineering. Instead of relying solely on training data, several approaches are enabling AI to fetch relevant, up-to-date information from external sources before answering. This simple idea dramatically improves reliability.

Competition among AI providers is fueling a virtuous cycle: better architectures, better safeguards, and fewer hallucinations.

Hallucinations shouldn’t stop AI adoption

The story of AI hallucinations is not one of fear, but of engineering maturity. Mistakes will happen, just as they do with people, but the key is how systems are designed to prevent, detect, and recover from them. Many of today’s AI platforms show that when learning, symbolic reasoning, and repeatability are built into the architecture itself, oversight becomes part of the fabric rather than an external patch.

AI is not replacing judgment; it is amplifying it. The organizations that win will be those that understand hallucinations not as a reason to slow down, but as a reason to design smarter. Human oversight and platform resilience together form the real guarantee of quality in the age of intelligent systems.

AI Hallucinations Are Becoming Rarer – But Human Supervision Still Matters

By Gastón Milano, CTO of Globant Enterprise AI

From Search to Answers: Why Customer Visibility in the Age of AI Requires a New Playbook

Agentic AI in the Enterprise: Risk and Autonomy

SVJ Writing Staff

Agentic AI in the Enterprise: Risk and Autonomy

Leave a Reply Cancel reply

AI at the Human Scale: What Silicon Valley Misses About Real-World Innovation

From hype to realism: What businesses must learn from this new era of AI

From recommendation to autonomy: How Agentic AI is driving measurable outcomes for retail and manufacturing

The Future of Travel Frictionless and full of social vibes

The Human-AI Collaboration Model: How Leaders Can Embrace AI to Reshape Work, Not Replace Workers

50 Key Stats on Finance Startups in 2025: Funding, Valuation Multiples, Naming Trends & Domain Patterns

CelerData Opens StarOS, Debuts StarRocks 4.0 at First Global StarRocks Summit

Clarity Is the New Cyber Superpower

AI presents new threats and opportunities in cybersecurity

Powering the Next Generation of Asset Managers: Turning Agentic AI into Measurable ROI

How AI Is Powering a New Era of Business Decision-Making

Your customers are talking, but are you listening? How AI Conversational Intelligence is rewriting the rules of customer experience

Recent News

AI presents new threats and opportunities in cybersecurity

Powering the Next Generation of Asset Managers: Turning Agentic AI into Measurable ROI

How AI Is Powering a New Era of Business Decision-Making

Your customers are talking, but are you listening? How AI Conversational Intelligence is rewriting the rules of customer experience

Content Categories