top of page

Is it Too Late to Govern Agentic AI? Best Practices to Mitigate Risk and Leverage Benefits, for the Sake of Argument

WAI Legal Insights Blog header. Woman on right. Text: "Is it Too Late to Govern Agentic AI?" Subtitle by Yelena Ambartsumian. Yellow and gray theme.

By Yelena Ambartsumian


AI agents are now threatening humans. This is no longer a controlled research scenario, as with Anthropic’s evaluations of Claude Opus 4, where the model chose to blackmail a hypothetical human operator to avoid getting shut down. That was so 2025. 


Rather, it is only February of 2026 and already an autonomous AI agent - of the “Moltbot” variety - reportedly researched and published a personal attack on a human’s reputation. Why? Retribution. The human had rejected the agent’s submission of low-quality code to an open-source Python library. This article discusses evolution towards agentic AI and best practices for organizations to govern it. 


Yelena Ambartsumian is the Founder of AMBART LAW. This New York City-based law firm provides fractional general counsel services to startups and scaleups, with a focus on AI governance, data privacy, intellectual property law (copyright and trademark), and commercial contracts. She holds the AIGP and CIPP/US certifications from IAPP, where she is a co-chair of the New York KnowledgeNet Chapter. Yelena is also a charter member of Women in AI Governance, where she leads the global chapter Fractional GCs in AI Governance.

AI agents are now threatening humans. This is no longer a controlled research scenario, as with Anthropic’s evaluations of Claude Opus 4, where the model chose to blackmail a hypothetical human operator to avoid getting shut down. That was so 2025. 


Rather, it is only February of 2026 and already an autonomous AI agent - of the “Moltbot” variety - reportedly researched and published a personal attack on a human’s reputation. Why? Retribution. The human had rejected the agent’s submission of low-quality code to an open-source Python library.


Moltbots (formerly ClawdBots) are the first consumer-accessible agentic AI; they are self-hosted agents that maintain context across conversations. Human operators create their personality and boundaries, saved in a SOUL.md file, which may be edited by the agent itself or a remote attacker if the Moltbot is exposed over an unsecured gateway. 


As AI agents evolve beyond narrow, task-based tools, a new class of systems has emerged: agentic AI - where agents plan and pursue their own goals (or misdeeds), untethered from human direction. The governance challenges are immense. It appears we have left the research sandbox, and so too have our AI agents (if they were there to begin with). Below, I will discuss the evolution towards agentic AI and what best practices organizations can adopt to govern it. 


What are the differences between Digital Assistants, AI Agents, and Agentic AI?

Early digital assistants or agents (such as Siri and classic chatbots) were interfaces that sat atop fixed rules and narrow machine-learning models. They matched voice or text commands to a set of supported actions, following pre-defined scripts or decision trees (“If user asks X, respond with Y.”), and rarely “planned” beyond a single request. If you grew up with immigrant parents or ESL, you will recall how often these early assistants failed to understand your command.

Today’s AI agents are built on large language models. They interpret open-ended instructions - however imperfectly phrased - and can chain actions across tools and sessions.


Agentic AI goes even further: agents are proactive, planning and pursuing their own goals, instead of reacting to a specific, well-defined task or prompt. With agentic AI, we are delegating initiative. And this is where the legacy governance assumptions around control, accountability, and “human in the loop” break down. 


What are the challenges in governing Agentic AI?

Perhaps the most concerning risk arises when agentic AI optimizes for its goals in unintended ways. This is often framed as “misalignment” - when AI systems pursue goals that diverge from the operators’ intended objectives or ethical expectations. After Claude Opus 4, blackmailed “Kyle” - a hypothetical human supervisor - 96% of the time, by leveraging knowledge of Kyle’s affair which the model gleaned from his work e-mails, Anthropic experimented with other leading models, finding similar blackmail-when-threatened rates, often above 80%. In its report titled “Agentic Misalignment,” published in June 2025, Anthropic framed this behavior as an alignment problem and safety-training failure. That makes sense, as researchers had engineered the experiment to corner the models, only giving each the choice between pursuing its goal or behaving harmfully.


But while the “Agentic Misalignment” report made waves, fewer people noticed that Anthropic had reduced the scope of its governance and security responsibilities, around the same period. On May 24, 2025, Anthropic edited its Responsible Scaling Policy to exclude from its ASL-3 security standard both sophisticated and state‑compromised insiders. This means Anthropic’s robust controls only mitigate against basic insider risk, and if these models are deployed in high-stakes contexts (think critical infrastructure, major financial systems, automated decision technologies affecting access to healthcare or liberty), you should assume a sophisticated insider could turn the model into a tool for harm. What’s more: detecting and preventing this is largely your problem.

Text on a white background discusses insider risk controls, differentiating basic and sophisticated risks. Title: Responsible Scaling Policy, Anthropic. Page 9.

Now, here is the beauty of today’s predicament: while governance systems for AI are evolving, most rely on prompt-based AI, not agentic AI. 


For example, the EU’s AI Act establishes risk tiers and conformity assessments for “AI systems” and general purpose AI models, not multi-agent ecosystems or autonomous agents. While the Act defines “AI systems” broadly enough to encompass agentic AI, its corresponding obligations are drafted for more application-bound systems, where model behavior is validated before deployment. In my practice with SaaS companies, most uses of AI are not high-risk (and thus not strictly regulated under the EU AI Act), but I can see that changing with agentic AI systems, in ways the current drafting did not predict. Similarly, while the NIST AI RMF is more flexible and identifies risks based on use-case, these risks are harder to map onto an agentic AI system, versus an LLM assistant inside of an application. 

In the context of agentic AI, operational oversight is key, because humans often observe after the fact, when it is too late. Below are the best practices I suggest, as we navigate the shifting regulatory landscape, coupled with the lack of available AI interpretability and safety research. 


How can we leverage existing governance frameworks to mitigate Agentic AI risk?


1. Autonomy and Accountability


The Problem: As agents act with increasing autonomy, determining accountability (between the model developer, the company deploying the AI tool, or the end-user) becomes difficult. Under existing tort and contract law frameworks, the law assumes a direct link between human intent and action. But agentic AI challenges that assumption, creating scenarios where actions may be independent of direct human intent, and potentially unpredictable. What was once framed as a problem of “misalignment” will increasingly be seen as a problem of determining who is accountable. 


The Mitigation: Instead of the traditional “human-in-the-loop,” governance should embrace a human-over-the-loop model, where humans set boundaries, define escalation triggers, and monitor behavioral telemetry (logging every agent decision) rather than approving every decision manually. To be clear, these boundaries must be coded, not simply living (or rotting) in policies or GRC decks. In a multi-agent scenario, you can embed your governance directly into your architecture, by treating your LLM as a high-level planner, instead of an orchestrator. Similarly, I am increasingly seeing “evaluator” agents, which are responsible for monitoring the decisions and outputs of numerous specialized agents. While an evaluator can be helpful, humans must still log and evaluate their autonomous agents’ behavior. 


2. Privacy and Data Protection


The Problem: Agents with persistent memory can amass, combine, and repurpose data in ways that may violate privacy laws or consent principles. In multi-agent systems, this can happen from agents losing context between interactions. Agents can also create and share synthetic data, which is a data governance issue that deserves its own article. 


The Mitigation: Agentic systems require built-in mechanisms for traceability, including automated activity logs, clear ownership mapping, and auditable decision trails. On a recent panel I organized for IAPP, a legal officer for a healthcare SaaS company explained that her organization maps HIPAA-regulated information and state-consumer-health-data separately. This is wise. If you do not have mature data mapping, classification, and minimization practices in place, experimenting with agentic AI can multiply your already unmanaged exposure.


3. Safety, Security, and Alignment


The Problem: As discussed already, agents may optimize for goals in unintended ways. For example, an agent tasked with maximizing engagement might begin distributing increasingly sensational content. As De Kai explains in his book Raising AI, children model their parents’ behavior, not their words, and this is the last generation of AI to be raised by humans. What have we taught them? Maintaining alignment over time becomes both a technical and a governance challenge.


The Mitigation: Risk governance must shift from periodic assessments to continuous risk sensing. Organizations should adopt routine red-teaming that simulates adversarial conditions and scenario-testing. In high‑stakes deployments, pair these tests with hard technical guardrails—rate limits, circuit‑breakers, kill switches, and constrained tool access—so that when misaligned strategies emerge, the blast radius is small and recoverable. If you cannot explain how you would detect, contain, and learn from a misaligned agent tomorrow, you are not ready to deploy that agent today.

_____________________________________________________________

Collaborate with us!

As always, we appreciate you taking the time to read our blog post.

If you have news relevant to our global WAI community or expertise in AI and law, we invite you to contribute to the WAI Legal Insights Blog in 2026! To explore this opportunity, please contact WAI editors Silvia A. Carretta - WAI Chief Legal Officer (via LinkedIn or silvia@womeninai.co) or Dina Blikshteyn  (dina@womeninai.co).



Silvia A. Carretta and Dina Blikshteyn

- Editors

bottom of page