Why I'm Skeptical of "AI Safety" (From a Security Perspective)

Every few weeks, a new AI safety debate takes over the industry. Someone warns about superintelligence. Someone else writes a manifesto about existential risk. A third person declares that regulation must arrive before the models become too powerful to control.

I understand the instinct. When a technology moves this fast, responsible people should ask hard questions early. But from a cybersecurity perspective, I think the center of gravity is wrong.

We are spending extraordinary intellectual energy on hypothetical future failure modes while underinvesting in the very real attack surface that already exists. Prompt injection is here. Data leakage is here. Identity confusion is here. Model abuse is here. Supply chain risk is here. Yet much of the public conversation still sounds like a philosophy seminar about a machine god that does not exist.

That mismatch matters. In security, attention is a resource. Every hour spent debating speculative doom is an hour not spent hardening the systems that are already exposed to the internet, already integrated into workflows, and already trusted with sensitive data.

My skepticism is not about whether AI can become dangerous. Of course it can. My skepticism is about the label AI safety as it is commonly used today. Too often, it frames the problem at exactly the wrong altitude.

Security cares about the system, not the story

Security people are trained to ask a different set of questions than most AI researchers.

We do not begin with intention. We begin with surface area.

What can this system access? What assumptions does it rely on? Where does untrusted input enter? What happens when the model is wrong, manipulated, or overconfident? Which downstream actions can it trigger? How quickly can we detect abuse? How cleanly can we contain it?

That lens is less cinematic, but it is much more useful.

Most real incidents in technology do not come from evil genius scenarios. They come from ordinary trust boundaries failing under pressure. A dashboard that should have been read-only can suddenly execute actions. A retrieval system pulls in poisoned content. A chatbot leaks internal instructions. An agent with broad permissions sends the wrong command to the wrong system. None of this requires AGI. It only requires a production deployment with weak controls.

That is why I increasingly see AI risk as an application security problem, an identity problem, and an infrastructure problem before I see it as a civilization-ending intelligence problem.

The threats are mundane, which makes them dangerous

The most dangerous category of risk is often the least glamorous one.

Take prompt injection. In plain English, it means an attacker can smuggle instructions into content the model is allowed to read, and the model may treat those instructions as higher priority than the user's real intent. That is not some edge-case curiosity. It is the LLM equivalent of input handling failure, and it becomes more serious as soon as you connect the model to tools, memory, browsing, tickets, codebases, or customer data.

Or take data exfiltration. If you combine broad context windows, retrieval pipelines, weak tenant isolation, and agentic actions, you are effectively building a brand new class of data leakage mechanism at machine speed. Again, no superintelligence required. Just loose boundaries and optimistic product design.

Then there is model inversion and extraction. If a system exposes enough interface surface, attackers will try to reconstruct hidden prompts, infer training data, or imitate internal behavior. We have seen versions of this pattern across every major software abstraction. AI will not be the magical exception.

There is also the old favorite from cybersecurity, which is privilege sprawl. Teams love to say, “Let the agent handle it.” Fine. What permissions does the agent have? Can it read billing data? Can it rotate secrets? Can it contact customers? Can it touch production? If the answer is yes to all of the above, you did not build an intelligent system. You built an unmonitored super-admin with a natural language interface.

AI safety often talks like the model is the product

One reason the debate goes sideways is that many people still treat the model itself as the primary unit of analysis. They talk about alignment, refusal behavior, or dangerous capabilities as if the core artifact is the foundation model in isolation.

In the real world, that is rarely where the biggest risk lives.

The model is embedded in a product. That product has connectors, APIs, logs, retries, roles, fallback behaviors, hidden prompts, third-party dependencies, analytics hooks, and business incentives. It lives inside an operational environment full of shortcuts, deadlines, and partial understanding.

That broader system is what creates most of the security risk.

I have seen this pattern for years in infrastructure. Teams buy a security product and assume the product is the control. It is not. The control is the operating model around it. AI is no different. A well-trained model wrapped in a reckless product architecture is unsafe in all the ways that matter operationally.

This is why I distrust any AI safety narrative that does not begin with deployment reality. If your framework cannot explain how an LLM agent should be permissioned, logged, rate-limited, segmented, and audited, it is incomplete.

The next wave of incidents will look embarrassingly familiar

I suspect the first truly consequential AI failures in business will not feel novel at all. They will feel painfully familiar to anyone who has worked in security.

A customer support agent leaks another customer's information because the retrieval layer was sloppy.
An internal operations bot performs the wrong action because an attacker manipulated upstream text.
A code agent introduces a vulnerability faster than the review process can catch it.
A company gives an AI workflow broad SaaS access, then discovers its audit trail is incomplete.
A sensitive document enters the wrong context window and gets summarized into the wrong place.

These are not science fiction scenarios. They are just the modern versions of broken access control, injection, insecure defaults, and insufficient logging.

That is the irony. The more we talk about AI as if it is unprecedented, the more likely we are to ignore the fact that many of its worst risks map directly onto well-understood security disciplines.

What a security-first AI posture actually looks like

If I were evaluating whether an AI system is “safe,” I would start with a much more boring checklist.

Least privilege by default. Agents should get the minimum scope required for a task, not broad convenience access.
Clear trust boundaries. Untrusted input, retrieved data, system prompts, and tool outputs should never blur into one opaque stream.
Action gating. High-impact actions need policy checks, approvals, or deterministic guardrails outside the model.
Observability. If an agent can think, decide, and act, you need logs that make those steps reconstructable after the fact.
Isolation. Tenant separation, memory boundaries, and credential scoping matter more in AI systems than in ordinary apps, not less.
Rate limits and blast-radius control. When models go wrong, they can go wrong quickly and repeatedly. Containment beats elegance.
Human review where it actually counts. Not performative sign-off everywhere, but focused oversight on irreversible or high-risk operations.

Notice how little of this depends on a philosophical position about consciousness or intent. This is engineering discipline. This is security hygiene. This is how mature teams operate when they know systems will fail in messy ways.

There is a business risk in using the wrong vocabulary

Language shapes budgets.

When executives hear “AI safety,” many imagine ethics panels, future regulation, or abstract risk committees. Those things have their place. But if that framing causes companies to underfund application security reviews, identity design, data governance, and operational controls for AI deployments, it becomes actively harmful.

The board does not need another vague conversation about whether the model might become deceptive. The board needs to know whether the company's AI stack can leak customer data, trigger unauthorized actions, violate compliance boundaries, or amplify an attacker at scale.

In other words, we do not need less urgency. We need urgency pointed at the right problems.

The future risks may be real, but today's negligence is optional

I am not dismissing long-term AI risk. Some of the people working on it are thoughtful, serious, and worth listening to. But as someone who has spent more than two decades in cybersecurity, I have learned to distrust discussions that skip over the present in favor of the dramatic future.

The internet already taught us this lesson. We ignored basic security while building massive digital dependence, then spent twenty years paying interest on that negligence. We should not repeat the pattern with AI.

If you are deploying AI into production today, the right question is not, “Could this one day become uncontrollable?”

The right question is, “What can this system do right now, what can go wrong right now, and what happens when a smart adversary touches it?”

That is where the real work is. That is where the real risk is. And frankly, that is where the adults in the room should be spending their time.

Because in security, the danger is rarely the thing everyone is writing essays about. It is the thing already in production, trusted too much, and watched too little.

Follow the journey

Subscribe to Lynk for daily insights on AI strategy, cybersecurity, and building in the age of AI.

Subscribe →