
“By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
–Eliezer Yudkowsky, creator of the Friendly AI theory of artificial general intelligence (AGI)
Every cybersecurity company today claims to be “AI-powered.” It’s the participation trophy of tech marketing – everyone wins.
Yet when I speak with CISOs, board members, and risk leaders, I hear the same deeply unsettled concerns, expressed in different ways:
“I don’t know when to trust the AI.”
“I don’t know what it’s doing when it’s wrong.”
“I don’t know how to explain it to my board – without saying ‘just trust it.’”
It’s like that intern you won’t be hiring after the summer: incredibly fast, too eager to please, and unacceptably too often wrong.
The gap—between what AI promises and what leaders can confidently rely on—is becoming one of the most under-acknowledged risks in cybersecurity.
AI is no longer experimental. It’s operational. It influences how risk is assessed, prioritized, and communicated at the highest levels of the organization. As futurist Jaron Lanier writes, “The danger isn’t that AI is unintelligent—it’s that it is persuasive.”
And yet, in most environments, AI is still treated like a black box you either believe in… or quietly double-check with “human-in-the-loop” when no one’s looking.
That approach doesn’t scale.
Because in cybersecurity, AI without trust is just automation. It may be fast. It may be impressive. But without assurance, it’s an HOV lane to higher risk.
The Problem with “AI-Powered”
The industry has moved quickly to embed AI everywhere. What hasn’t kept pace is our ability to understand, govern, and validate AI behavior over time.
Accuracy is the goal – but accuracy alone is not trust.
An AI system can appear accurate and still:
- Drift quietly as data changes (like that intern who stopped taking direction by July)
- Fail in edge cases that matter most (rare events happen)
- Produce confident outputs that are subtly wrong (let’s play move that decimal point)
- Behave inconsistently across similar scenarios (hey, consistency is for small minds)
Most AI systems are evaluated once and assumed to behave reliably forever. But cybersecurity is not static. Threats evolve. Context shifts. Data distributions change.
And let’s get real: In what other line of human endeavor would we expect “hallucinations” to be an outcome within risk tolerance?
When AI operates without continuous measurement, it doesn’t just age—it decays. And when that decay is invisible, trust erodes quickly.
This is why many CISOs continue to rely on human overrides and parallel validation—even when AI is available. Not because they resist innovation, but because they lack assurance.
Or put more bluntly:
They trust the AI… but they don’t trust it alone.
Trust Must Be Measured, Controlled, and Visible
In cybersecurity, trust is earned through evidence.
- We don’t accept security claims without validation.
- We don’t manage risk without metrics.
- We don’t rely on controls we can’t observe or explain.
But when it comes to AI, many vendors still ask customers to trust systems making critical risk decisions – without visibility into how those systems behave over time.
As AI scholar Kate Crawford says, AI “breaks the chain of accountability. You can’t audit what you can’t explain.”
That’s a core issue.
If AI is influencing decisions that matter, leaders should be able to answer fundamental questions (without hand waving);
- How reliable is this system today?
- Has its behavior changed over time?
- Where does it perform well—and where does it struggle?
- What happens when it’s wrong, and how is that handled?
Trust cannot be a black box. It must be measurable, controllable, and transparent – especially when AI is involved.
Because “it usually works” is not a governance strategy.
Why This Matters Now
Every B2B company will have AI. That is no longer a differentiator.
What will differentiate vendors is not who claims the smartest AI, but who can demonstrate reliability, stability, and accountability over time.
- Boards will ask for it.
- Regulators will expect it.
- CISOs will demand it.
The industry will need to move from AI claims to AI assurance.
What We’re Working Toward
At SAFE, we believe trust in AI must be designed intentionally – not assumed after the fact. To put it in programming-speak, it must be a “first primitive” of agentic AI.
Over the coming months, we’ll share more about how we’re approaching this challenge, what we’re learning from real-world AI systems, and how trust can become a property you actively manage – not a leap of faith.
Because in cybersecurity, trust is not a feeling.
It’s something you measure, control, and make transparent.