Chapter 8: Fallible: AI Makes Mistakes Too -

“The AI speaks with the confidence of an expert even when it knows nothing. It has no capacity for intellectual humility because it has no concept of its own ignorance.” — Nexus, Chapter 8

The Myth of Mechanical Objectivity

There’s a persistent belief that computers are objective—free from the biases, emotions, and errors that plague human judgment. If a machine makes a decision, it must be based on pure logic and data. This belief is dangerously wrong.

AI systems are fallible. They make mistakes. They have biases. And their mistakes can be harder to detect and correct than human errors precisely because we don’t expect them.

Sources of AI Error

Training Data Bias: AI learns from historical data that reflects historical biases

Objective Misspecification: AI optimizes for the wrong thing

Distribution Shift: AI fails when the world changes from what it learned

Adversarial Manipulation: AI can be deliberately fooled

Emergent Behavior: Complex systems produce unexpected outcomes

Bias In, Bias Out

AI systems learn from data. If that data reflects historical discrimination—and most historical data does—the AI will learn to discriminate. This is not a bug that clever engineers can easily fix; it’s inherent in the learning process.

The COMPAS Algorithm

COMPAS is an AI system used by US courts to predict recidivism (whether criminals will reoffend). Studies found it was twice as likely to falsely label Black defendants as high-risk compared to white defendants.

The system wasn’t programmed to be racist—it just learned from historical data that reflected systemic racism in the criminal justice system.

The Black Box Problem

Many AI systems—especially deep learning neural networks—are “black boxes.” They produce outputs, but we can’t fully explain how they arrived at those outputs. The system has learned patterns that aren’t accessible to human inspection.

This creates accountability problems. When an AI denies your loan application, you might have a legal right to know why. But if even the AI’s creators can’t explain the decision, how can they tell you?

Explainability vs. Accuracy Trade-off

Often the most accurate AI systems are the least explainable. Simpler, more interpretable models may perform worse. This creates a dilemma:

Option A: Use the black box and get better predictions but lose accountability

Option B: Use the interpretable model and get worse predictions but maintain accountability

In many applications—medicine, criminal justice, finance—this is a genuine ethical trade-off.

Hallucinations and Confabulation

Large language models (like GPT and its successors) have a peculiar failure mode: they “hallucinate.” They generate confident, fluent, detailed responses that are completely false. They invent citations that don’t exist, events that never happened, facts that aren’t true.

The AI doesn’t know it’s wrong. It’s not lying—it’s confabulating, producing plausible-sounding content without reference to truth.

The Infallibility Trap, Redux

Harari connects AI fallibility to his earlier argument about the dangers of infallibility claims. Throughout history, systems that claimed to be error-free—religious authorities, totalitarian ideologies—became dangerous precisely because they couldn’t self-correct.

AI risks repeating this pattern. If we treat algorithms as objective arbiters of truth, we disable our capacity to question them. Their errors become invisible—or, worse, become accepted as facts.

The Automation Complacency Problem

Studies show that when humans oversee AI systems, they tend to defer to the machine. Over time, human judgment atrophies. We stop questioning the algorithm because it’s usually right—and then we miss the times it’s wrong.

This is especially dangerous in high-stakes domains like aviation, medicine, and military applications.

Adversarial Vulnerabilities

AI systems can be deliberately fooled in ways humans would not be. Researchers have shown that:

Subtle changes to images (invisible to humans) can make AI misclassify them entirely
Carefully crafted inputs can make AI behave in unintended ways
Poisoned training data can introduce hidden vulnerabilities
AI systems can be manipulated by other AI systems

As AI becomes more central to critical infrastructure, these vulnerabilities become security threats.

The Arms Race of Deception

We’re entering an era where AI is used to create deceptive content (deepfakes, generated text) and other AI is used to detect it. This arms race has no clear endpoint. The tools of deception and detection will evolve together, with uncertain outcomes for human truth-seeking.

Building Humility Into AI

Harari argues that healthy AI systems, like healthy human institutions, need built-in humility—mechanisms to acknowledge uncertainty, flag potential errors, and enable correction. This means:

AI should express confidence levels, not just conclusions
Systems should be designed for human override
Error reporting should be encouraged and analyzed
High-stakes decisions should have human review
We should resist the temptation to remove humans from the loop

Key Takeaways

AI Is Fallible: Despite the myth of mechanical objectivity, AI systems make mistakes and have biases
Data Reflects History: AI learns from historical data, including historical prejudices
Black Boxes Block Accountability: We often can’t explain why AI makes particular decisions
Hallucination Problem: Language models generate false information with perfect confidence
The Humility Imperative: AI systems need built-in mechanisms to acknowledge uncertainty and enable correction