Field Notes

Agent Actions Need Repair Paths

2026-06-176 min readAIDesignWork

As AI agents move from suggestions into work systems, the humane interface problem is no longer only approval. It is how people recover when a plausible action has already changed the world.

The most underrated button in software may be undo. It is not glamorous. Nobody puts it in the keynote as evidence that the future has arrived. It sits there quietly, a small mercy for the moment when your hand moves faster than your judgment, or when the system does exactly what you asked and you discover that your asking was the problem.

AI agents make that little mercy harder to preserve.

The product direction is clear. GitHub's Agentic Workflows preview lets teams describe development tasks in natural language and run them as governed workflows. GitHub's Copilot cloud agent can research a repository, make changes, run tests, and open pull requests. OpenAI's Frontier frames enterprise agents as coworkers with permissions, observability, and auditable actions. Google Workspace's AI control center gives administrators visibility into AI and agent access across everyday office data.

This is not just chat getting more useful. It is software learning to reach into other software.

That shift changes the shape of a mistake. A bad answer can be ignored, argued with, or pasted into a document where it begins the slow, familiar process of becoming someone else's problem. A bad action has already crossed a boundary. It has edited a file, changed a setting, sent a message, updated a record, touched a spreadsheet, opened a pull request, moved a task, summarized a customer, or nudged a workflow downstream.

Some of those actions are small. Some are reversible. Some are technically reversible in the same way a spilled coffee is reversible if you have a towel and a generous definition of success.

The AI interface has spent a lot of energy on approval. That makes sense. Before an agent buys, deploys, emails, deletes, merges, schedules, or escalates, a person should often be asked to approve the thing. But approval is a thin moral technology. It captures one moment, usually under partial attention, and then pretends the matter is settled. The person looked at the summary. The person clicked the button. The log contains a human-shaped blessing.

Anyone who has worked inside a real organization knows this is not how accountability feels. The dangerous decisions are often the ones that looked reasonable in the tiny approval window. The wrong customer tier was selected because two accounts had similar names. The agent moved the issue because the label matched, though the social context did not. The generated fix passed the narrow test but changed a behavior a support team quietly depended on. The spreadsheet update was syntactically correct and institutionally bizarre. The calendar invite went to the right people and somehow still made the day worse.

The problem is not that agents will make mistakes. That is too easy a complaint, and also a little vain, given the human history of confidently ruining perfectly good systems by hand. The harder problem is that agents can make plausible, well-documented, interface-approved mistakes at machine speed, across systems whose repair rituals were designed for slower, more local trouble.

Repair is different from review. Review asks, "Should this happen?" Repair asks, "What happened, what changed because of it, who felt the consequence, and how do we put the work back into a trustworthy state?" That second question is messier. It does not fit neatly inside a thumbs-up flow. It requires the system to remember intermediate states, not only final outputs. It requires actions to be grouped into meaningful units, not scattered across logs like confetti after a meeting nobody enjoyed. It requires people to understand which changes were agent-made, which were human-made, and which were human-approved agent changes that now belong to a more complicated category.

Most workplace systems are less kind than code. Documents, CRM records, analytics dashboards, HR workflows, procurement forms, support queues, project boards, and inboxes all have histories of some kind, but they rarely behave like careful collaboration surfaces. They behave like places where work accumulates. When an agent begins acting across them, the audit trail may prove that something happened without helping anyone recover from it.

Observable agents need to become more than a security phrase. Observability should not only serve the administrator hunting for suspicious access. It should serve the person trying to repair ordinary damage. Show the action plan the agent believed it was following. Show every system touched, every object changed, every message sent, every skipped alternative, and where human approval changed the scope. Then provide a repair path that is as concrete as the action path: revert this file, restore this field, reopen this task, notify these people, mark this generated summary as withdrawn.

That sounds heavy because real work is heavy. The dream of agentic software is that it makes effort disappear. The reality is that some effort should not disappear; it should move into better structures. A manager should not have to become a detective because a workflow assistant updated the wrong account. A nurse should not have to infer whether an AI-generated handoff note was later corrected. A junior analyst should not have to explain why a dashboard changed when the agent's reasoning lives in three product logs and a conversational transcript named after Tuesday.

The design challenge is to make repair ordinary enough that using agents does not require theatrical trust. Every meaningful agent action should have a visible boundary, a named owner, a state before and after, and a way to unwind or compensate when unwinding is no longer possible. Some actions need dry runs. Some need staged commits. Some need quarantine spaces where the agent can prepare work without touching production reality. Some need receipts, yes, but receipts are not enough if they read like archaeology. The receipt should point toward repair.

There is also a cultural trap here. Organizations may be tempted to treat repair paths as evidence that agents are unsafe. Mature systems treat them as evidence that the organization is serious. Hospitals have incident review. Airplanes have checklists. Accounting systems have reversals and adjustments. Software has rollbacks. These are not admissions that the work should never happen. They are acknowledgments that consequential work deserves a way back from overconfidence.

No interface can perfectly control every action in advance. The larger question is whether our institutions can remain repairable as more work is delegated to systems that act with borrowed authority. A world full of agents will need more than permission prompts and audit logs. It will need ways for people to notice harm early, understand it without forensic heroics, and mend the work without becoming the landfill for machine confidence. Progress is not only making software act. It is making sure the world it changes can still be put right.