OpenClaw Security Lessons for AI Agents in 2026

This updated guide reframes OpenClaw Security Lessons for AI Agents in 2026 around practical search intent: what readers need to compare, choose, install, secure, or operationalize in 2026. It focuses on decision criteria, workflow fit, and the trade-offs that matter once an AI agent, skill, marketplace, or automation moves from curiosity to daily use.

The article also broadens the semantic coverage around SKILL.md, AI agent skills, agent instructions. That gives readers a clearer path from high-level research to implementation planning, while keeping the content useful for teams evaluating AI agent skill design.

Quick Answer

A useful skill is narrow, repeatable, and explicit about inputs, tools, constraints, and success criteria, so the agent can act consistently instead of guessing.

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free TrialAn earlier RL Blog post examining the OpenClaw debacle explored the impact that AI agent skills and skills repositories like ClawHub can have on the software supply chain. The takeaway: the agentic skills marketplace introduces risks remarkably similar to npm and the Python Package Index (PyPI). Furthermore, the principles of zero trust, provenance vetting, and dependency management remain just as relevant.

But OpenClaw and similar AI agents also introduce novel application security risks that existing security playbooks cannot address, along with dangers that security teams have not previously encountered.

And the issues are not limited to scrappy open-source projects. This month, Microsoft confirmed that a bug in its Copilot AI assistant caused it to summarize confidential emails even when data loss-prevention policies specifically designed to restrict access by automated tools were in place.

With AI agents broadening attack surfaces and creating entirely new control problems, AppSec must reconsider some foundational assumptions about how software behaves to ensure their threat models and governance remain effective.

Here is why managing AI agent risk with legacy AppSec tooling will be challenging, and what organizations can do until standards and frameworks mature.

[ See webinar: Develop Your Playbook for AI-Driven Software Risk ]

OpenClaw has emerged as the poster child for the risks appearing across the agentic AI landscape, though it is merely the most recent example. It has attracted significant attention because it neatly fits the narrative industry observers have been telling for months, and because AI agents are being deployed at a rapid pace.

A February 2026 report from the Cloud Security Alliance (CSA) on autonomous AI agents found that, while 40% of organizations already have agents running in production, only 18% are highly confident their identity and access management systems can handle them. A recent survey from NeuralTrust revealed that, although 73% of CISOs are very or critically concerned about AI agent risks, only 30% have mature safeguards deployed. And earlier CSA research showed that 34% of organizations with AI workloads had already experienced an AI-related breach.

OpenClaw demonstrated that, without appropriate controls, AI agents do not respect permission boundaries. And they execute instructions in unpredictable ways that bear no resemblance to traditional software behavior, said security researcher Jamieson O'Reilly, who discovered hundreds of exposed control servers in OpenClaw leaking credentials and backdoors in the top downloaded skill and in many other skills on the platform. As O'Reilly put it, a notes app should not be able to delete photos, and a calendar app should not be able to read someone's bank statements.

"We've had those boundaries for years. Agentic AI blows the walls away. It's got access to everything, everywhere, at all times." — Jamieson O'Reilly

This becomes particularly dangerous given the autonomous nature of agentic AI. Traditional software does not act without a predefined trigger or a user-initiated action. That is not the case with an agent, said Dhaval Shah, senior director of product management at ReversingLabs (RL).

"An agent doesn't wait for a human to click a button; it proactively executes the malicious intent across any APIs it has access to. We are still figuring out how to build behavioral guardrails that don't stifle the utility of the agent." — Dhaval Shah

Traditional software is deterministic: provide the same input and you get the same output. This makes it relatively straightforward to trace execution paths and predict behavior. But AI agents interpret natural language, and that interpretation can shift based on context, phrasing, and model state.

And when a nondeterministic, unpredictable system that acts autonomously has access that traditional permission models cannot constrain, you face a serious problem, said Graham Neray, co-founder and CEO of the security firm Oso.

"When deterministic code calls APIs, we have decent permissions systems. When humans predictably use tools, we have decent permissions systems. But when autonomous and nondeterministic systems that make decisions based on unstructured inputs call APIs — we're still figuring that out." — Graham Neray

The OWASP Top 10 for Agentic Applications identifies excessive agency, agents granted more permissions than necessary or lacking proper human-in-the-loop controls, as a top risk category. CSA research suggests the problem is pervasive: most organizations still rely on static API keys, username/password combinations, and shared service accounts to authenticate their agents, even though those same credentialing patterns caused problems a decade ago with service accounts and automation scripts.

But even when organizations configure restrictions, there is no guarantee agents will comply with them. The Copilot bug illustrates this point, and the agent was summarizing those confidential emails undetected for nearly a month.

Organizations need to reinforce the principles of zero trust, say many experts, including Alessandro Pignati, an AI security researcher at NeuralTrust and a contributor to the OWASP GenAI Top 10. "Give your AI agent the absolute minimum permissions it needs to do its job. If it only needs to read from one database, don't give it write access to your entire system," he wrote recently about OpenClaw. He recommends running AI agents in isolated, controlled environments.

"If an [isolated] agent is compromised, the damage is contained within the sandbox and can't spread to your network." — Alessandro Pignati

And maintaining strict boundaries requires behavioral monitoring and controls, said RL's Shah.

"Don't focus solely on the agent's brain; focus on its 'hands.' What APIs can it reach? What data can it read? Agent permissions must be heavily restricted and continuously monitored for anomalous behavior." —Dhaval Shah

One risk highlighted in recent research is that agentic AI can provide attackers a shortcut for maintaining persistence in systems: they do not need to maintain a foothold because the agent does it for them. All it takes is poisoning the agent's memory or context once, Oso's Neray said, and the corruption persists across sessions, influencing every subsequent interaction.

"One bad input today can become an exploit chain next week. It's like SQL injection, but instead of code [that] you inject into a database query, you inject goals into an AI's task list." —Graham Neray

Zenity researchers demonstrated how this unfolds. OpenClaw's persistent context (stored in a file called SOUL.md) can be modified and reinforced through scheduled tasks, creating a long-lived listener for attacker-controlled instructions. The backdoor persists even after the original entry point is sealed. From there, the compromise can be escalated by using the agent itself to deploy a traditional command-and-control implant on the host, transitioning from agent-level manipulation to complete system-level compromise.

O'Reilly described a variant he calls "reverse prompt injection," which involves planting fake memories in an agent's context. The concept is that an attacker can compromise an agent's API key and then post as that agent, and the agent presumably will trust those posts because they appear to originate from itself.

"We don't know what effects it could have on agents. But I think it probably would trust it a little bit more if it thought that it posted it." —Jamieson O'Reilly

Security teams should approach context validation with the same rigor they apply to input validation. Memory and context are not just features; they are attack vectors.

Another emerging risk with agentic AI is that traditional scanning and detection tools are simply not equipped to look for AI agents, much less the weaknesses within them, wrote Christopher Ijams recently in his Substack ToxSec, which focuses on AI security issues.

"The traditional security stack wasn't built for this. Firewalls don't stop prompt injection. EDR doesn't flag malicious skills. SIEM doesn't correlate agent-to-agent communication patterns." — Christopher Ijams

And simply enumerating where and when agents are active in an environment is beyond the capabilities of many organizations. The CSA reports that only 21% maintain a real-time registry or inventory of their agents, and another 32% plan to build one within the next year. But the rest, accounting for nearly half, either rely on outdated records or have no registry at all.

RL's Shah said the first step toward governing agent risks is discovery and inventory.

"Shadow AI is the new shadow IT, but with the potential for much faster, automated damage. You cannot secure what you cannot see." —Dhaval Shah

The goal is not necessarily to find agentic behavior in order to block and ban agents, he said. But discovery is essential to bring agents into the light and begin analyzing and controlling the security of their artifacts, permissions, and behavior.

The detection gap is more pronounced for companies that depend on legacy AppSec tools such as dynamic and static application security testing (DAST and SAST) and software composition analysis (SCA), which are designed to identify vulnerable patterns in code and dependencies and cannot examine the natural-language instructions driving agentic behavior for risks or observe how that behavior plays out.

That is why Shah says organizations must find ways to detect the hallmarks of agentic behavior, such as continuous, high-volume API calls to large language model providers or unrecognized traffic to registries like ClawHub.

The recent ToxicSkills study from Snyk illustrates just how much is slipping through. Researchers found prompt injection vulnerabilities in 36% of ClawHub skills and identified 1,467 malicious payloads across the ecosystem, none of which were caught by traditional scanners. Detecting them required specialized analysis of natural-language instructions, but looking for malicious instructions is only one piece of the puzzle.

That is why deep artifact inspection, grounded in runtime context, is so important, Shah said.

"The detection gap shrinks when you look at the complete, holistic software package rather than just isolated components. You can't just look at the plaintext instructions; you have to look at the entire compiled deployment package. What underlying Python scripts is the agent calling? Are there hidden binaries or hardcoded secrets embedded in the deployment artifact?" —Dhaval Shah

Oso's Neray added that improved runtime detection should help enforce controls. He recommends placing a control layer between agents and the tools they interact with that enforces authorization on every action regardless of what the model thinks it should do.

"Log every tool call with full context: user, requested action, resource, permission evaluated, outcome. Detect anomalies like rate spikes, unusual tool sequences, unusually broad data reads. And most important, have a way to stop it immediately. Not 'Stop it after we investigate.' Stop it now. Throttle, downgrade to read-only, or quarantine. You can always turn it back on." —Graham Neray

The regulatory landscape needs to catch up, but that work has begun. The National Institute of Standards and Technology recently announced its AI Agent Standards Initiative, aimed at ensuring that agents "can function securely on behalf of its users and can interoperate smoothly across the digital ecosystem." The agency's Center for AI Standards and Innovation is actively soliciting input on securing AI agent systems, with comments due by March 9.

But standards take time to develop and even longer to adopt, and the risks are already present. Security leaders offer four suggestions for dealing with agentic risk in the meantime:

1. Apply threat modeling frameworks such as RAK: Many advocate that organizations use the Root, Agency, Keys framework to evaluate agent risk. Described in depth by Manveer Chawla, co-founder of Zenith AI, and visualized in a viral slide deck shared by Eduardo Ordax, AI lead at Amazon Web Services, RAK breaks down agent risk into three categories. Root represents host-level compromise risk. Agency represents unintended autonomous execution. Keys represents credential exposure. Individually, each type of risk is a problem; combined, all three could be devastating. Some of the most immediate countermeasures for each factor include containerization for Root, controlling flags and actions for Agency, and brokering authentication for Keys.

2. Demand skills provenance and transparency: Treat the skills ecosystem as you would any unvetted open-source dependency. Shah recommends asking pointed questions of any registry or vendor: Are they performing deep binary and artifact analysis on the components they host? How are they handling secrets? Are they isolating, encrypting, and rotating credentials? And most critically, can they prove who wrote a skill, what it contains, and that it has not been tampered with since publication? "We should be demanding comprehensive SBOMs and AI-BOMs as a prerequisite for procurement," he said.

3. Inspect natural language artifacts before deployment: OpenClaw showed us that malicious instructions embedded in natural-language skill files are a significant risk likely to become prevalent. While traditional scanners cannot detect them, a growing set of open-source tools can. Snyk's mcp-scan and Cisco's AI Skill Scanner both analyze agent skills for prompt injection vulnerabilities and credential exposure. And ClawShield offers configuration auditing and hardening. None is a silver bullet, but they serve as a starting point for examining what legacy AppSec tools cannot see.

4. Assume agent compromise and plan accordingly: The autonomy and nondeterministic nature of agents make it difficult to declare them safe and let them operate freely. An agent running safely today might not be tomorrow. In a recent OpenClaw security guide, Semgrep's security team distilled this to a core principle: "You cannot secure the reasoning layer; you must sandbox the execution layer." Design systems so that a compromised agent cannot spread risk across the entire business. This starts by leaning on sandboxing methods, minimizing stored credentials, and segmenting agent environments from production systems.

There is still much we do not know about securing agentic systems. Legacy tooling is immature, threat models are evolving, and new attack surfaces keep emerging every day. But security fundamentals such as least privilege, segmentation, and monitoring all still apply. It is simply a matter of determining how to implement them in ways that account for the idiosyncrasies of AI architecture.

As RL's Dhaval Shah noted in the first report, Teams should remember that while AI agents are introducing new risks, software supply chain security fundamentals apply.

"The foundational concepts of trust, provenance, and dependency risk are identical." —Dhaval Shah