Field Notes

Agent Work Needs Handles

2026-06-085 min readAIInterfacesWork

As AI work moves out of chat and into canvases, sites, agents windows, and annotated artifacts, the humane design question is whether people can still grab the work where judgment actually happens.

For a long time, the default AI interface has asked people to supervise work by reading a transcript. The machine writes. The person scrolls. Somewhere between the ninth confident paragraph and the little spinner nearby, the person is supposed to determine whether the work is useful, wrong, risky, incomplete, or merely dressed for the right meeting.

This was tolerable when AI mostly answered questions. A transcript is a reasonable place for a conversation. It is a worse place to manage work.

The newer tools seem to know this, even when they do not quite say it that way. OpenAI is pushing Codex beyond software development with role-specific plugins, shared Sites, and annotations that let someone point at a chart, a claim, a navigation bar, a slide, or a bit of generated code and say: change this part, not the whole thing. GitHub's Copilot app now talks about canvases where agent work can become visible beside the chat. VS Code has an Agents window for managing work across projects, remote sessions, synchronized session history, and Chronicle commands for asking what happened in past agent sessions. Microsoft is telling leaders that "tokenomics" is becoming a management question, which is a grimly wonderful way of saying someone has to decide what work belongs to people, what work belongs to agents, and what it costs when that decision is lazy.

The thread is not magic. Chat is being demoted.

That is healthy. Chat is good for intention, clarification, and the occasional moment when you need to ask a machine why your build has chosen this afternoon to become a civic nuisance. But real work needs handles. It needs places where a person can grab the object, inspect a piece of it, leave a mark, ask for evidence, and change direction without negotiating with the entire conversation history.

Anyone who has reviewed AI work knows the small fatigue of the transcript-shaped day. The analyst asks for a dashboard and receives a long explanation of what it contains, which is nice, except the problem is the second chart. The marketer asks for a launch hub, but the customer quote is too smooth and the headline has the faint odor of a person who says "value unlock" in daylight. The engineer asks an agent to fix a bug and then has to skim through every thought, command, diff, and apology to understand which assumption mattered.

The issue is not that people are too impatient to read. Reading is good. The issue is that transcripts flatten work into a narrative of production, while judgment usually happens at the object level: this label, that migration, this permissions boundary, that customer name, the sentence that is technically accurate but will make the room go quiet for the wrong reason.

When a tool only gives you the conversation, it makes correction feel like debate. You explain the problem back through language, hope the model attends to the relevant part, and then watch it regenerate more than you wanted touched. It is like asking a contractor to repaint one wall and returning to find the kitchen philosophically reconsidered.

Handles change the social shape of the work. An annotation says: here is the place where my judgment lives. A canvas says: the work has state, and that state is not buried in the assistant's memory. A shared site says: this output is not just a reply, it is a room where people can inspect assumptions and make decisions. An agents window says: there are multiple pieces of delegated work in motion, and each needs status, evidence, scope, and an ending.

This is not just interface decoration. It is a theory of responsibility.

If a person is going to supervise agentic work, they need more than a cheerful summary and a proceed button. They need locality. They need to know what was changed, what was not touched, what the agent believes is finished, and where a human correction will land.

The best current product moves are interesting because they make the work more tangible. Codex annotations let feedback attach to a selected part of an artifact. GitHub canvases try to make agent progress visible as changes to the work object itself. VS Code's session history and Chronicle point toward an archive that can be queried when the question is not "what did you say?" but "what did we try, and why did we stop?"

There is a risk, of course, that every handle becomes another dashboard handle, another place where management can pull without feeling the weight of what is attached. A bad canvas can become a prettier transcript. A bad annotation system can become a stream of drive-by edits from people who never understood the work. The presence of handles does not guarantee better judgment. It only makes better judgment possible.

That distinction matters. Humane AI work will not come from interfaces that merely make delegation faster. It will come from interfaces that let people stay in meaningful contact with the work after delegation begins. The person should not have to choose between micromanaging every token and trusting an invisible process with a professional-sounding summary.

This is where design becomes a labor question. When work has no handles, the human role collapses into prompting, waiting, and auditing. The person becomes a narrator at the beginning and a liability sponge at the end. But when work has good handles, the person can intervene while the work is still forming, bringing taste, memory, politics, ethics, and practical irritation to the exact place where those things matter.

That is a quieter future than the one usually sold in agent demos. It has fewer rockets and more margins. It cares about selections, trails, diffs, labels, states, and not having to reconstruct an afternoon from a chat log.

But the shape of the interface tells us what kind of work we think people are doing. If the interface is only a command line for desire, then people are requesters. If it is only an approval queue, people are risk containers. If it is a transcript, people are readers of machine autobiography. The better possibility is an interface where people can still touch the work, not in the nostalgic sense of doing every task by hand, but in the practical sense of keeping judgment attached to reality.

As agents spread into more roles, the question will not be whether they can produce more artifacts. They can. The more important question is whether those artifacts arrive with enough handles for human beings to hold them, question them, repair them, and remain responsible for more than clicking yes at the end.