Agents crossed the effect boundary. Your logs didn't.

Share
Agents crossed the effect boundary. Your logs didn't.
Photo by REGINE THOLEN / Unsplash

One of the world's leading AI labs just published a plan for what to do if its own agents go rogue.

Last week, on June 18, 2026, Google DeepMind released an AI Control Roadmap that assumes — for planning purposes — that a capable AI agent might try to evade its oversight, copy itself without authorization, or work against the safety measures built around it. It treats a deployed agent the way a security team treats an employee with sensitive access: as a potential insider threat. Its answer is control — evaluate the agent, monitor it in real time, and be able to shut it down.

Control is half the answer. It assumes you can watch the agent inside your own walls, in real time — and DeepMind itself concedes oversight can be evaded. When it is, or when the action crosses a boundary you don't control, you need the other half: a record of what the agent actually did that it cannot rewrite, and that someone who doesn't trust you can still verify. That record barely exists today — and it's about to matter to everyone, not just the labs.

The effect boundary

For a few years, AI agents mostly suggested. You read the answer, you decided, you acted. The model's output was a draft; a human was the effect boundary.

That's over. Bigger context windows, multi-step reasoning, and tool use have pushed agents right up against — and across — the effect boundary: the point where a decision becomes a real-world consequence. Agents now write the purchase order, send the confirmation email, issue the refund, file the claim, move the money. The human is moving out of the loop, and the agent is moving into the place where things actually happen.

When an agent crosses that line, the important question quietly changes. It used to be "is the output good?" Now it's "what did it actually do — and can you prove it?"

Your logs are your own word

Most teams answer that question with logs. But a log is your own word. It lives in your database, it's mutable, and it means nothing to the people who will actually ask: a counterparty whose system your agent touched, an auditor, a regulator, the other company's agent on the far side of the transaction. None of them have any reason to trust your systems, and nothing stops a log from being edited after the fact. As agents start acting across organizational boundaries — between companies, between an agent and its counterparty, and between you and an auditor or regulator — "trust our logs" stops being an answer at all. The party that ran the action can't be the disinterested party that vouches for it.

What's missing isn't more logging. It's a record of what the agent did that someone who trusts neither you nor your agent can verify — from the bytes alone, without your cooperation.

And yes — you can capture the reasoning, honestly

As agents reason in more steps, it's tempting to seal "why the agent did it" too — the chain-of-thought, the thinking. You can, and it's useful. But be honest about what it is: the model's self-reported reasoning. Anthropic's own research found that reasoning models don't reliably say what they actually used — chains of thought are often unfaithful to the real computation. So a sealed reasoning record proves what the agent said it was thinking, tamper-evidently — not that it's the truth of how the model decided. That's worth recording. It is not worth trusting. Keeping that line bright is part of the point.

A ledger of what your agent did

The fix is old and boring, which is why it works: seal each consequential action as a tamper-evident, independently verifiable record, anchored to a public log — and what you keep is an anchored ledger of everything your agent did, checkable by anyone.

We're calling each record an Agent Action Capsule, and it's an open profile on the IETF's SCITT work — an individual Internet-Draft, with a reference verifier anyone can run. You add one line at the moment your agent does something that matters:

python

from capsule_emit import emit
cap = emit(action="write_po", operator="acme-co", developer="po-agent@v1",
           agent_input=..., agent_output=result,
           verdict="executed", effect={"type": "write_po", "status": "dispatched"})

You get back a capsule whose contents are committed to a hash and whose existence is written to a public, append-only transparency log — and it's appended to your running ledger, the anchored trail of every action so far. Three things fall out of that:

  • It records what it did, not just what it tried. A capsule carries the may/did distinction — a dispatched attempt can't be passed off as a confirmed effect — and it records on every verdict, including refusals. A blocked capsule is auditor-grade evidence that a gate worked.
  • Anyone can verify it, without trusting you. Re-hash the bytes, check the log. Change one byte and verification fails. Your prompts, vendors, and amounts never leave your machine — only a one-way fingerprint goes to the log.
  • It chains across organizations. Because a capsule is addressed by its content hash, a different agent — in a different company, with its own ledger — can confirm your action by its id alone. That's a verifiable trail across an org boundary, which a vendor's internal log fundamentally cannot produce.

Record now; enforce later, if you want

A capsule records; it doesn't block. But the same file that declares an action's rules can later be handed to a compatible gateway that enforces them — with no change to your code. Adopt the record now, on the easy path; add enforcement when you're ready. The record is the part that has to be neutral and ubiquitous, so that's the part we're opening.

It's live and open — and it's a one-liner

pip install capsule-emit, then one line per action — or one framework adapter (MCP, LangChain, CrewAI) and the ledger builds itself, a capsule per tool call, with no hand-placed calls. The producer library, the adapters, the ledger CLI, and the free hosted anchor are all open source (Apache-2.0); the format is an IETF Internet-Draft; the verifier is independent of the producer on purpose — any tool can make a capsule, any party can check one. No accounts, no keys, nothing leaves your machine but a hash.

Agents are going to keep moving closer to the effect boundary. The ledger of what they did there should be something anyone can trust — including the people who don't trust you.

Links: