Field Notes

Workspaces Need Housekeeping

2026-06-206 min readAIWorkDesign

As agents move into real files, tickets, calendars, and repos, the hidden work is not only giving them better tools. It is making the work itself legible enough to share.

The fantasy version of workplace AI imagines a clean handoff. You ask the agent to prepare the quarterly review, fix the bug, compare the vendor contracts, or draft the client follow-up. The agent quietly enters the work system, understands the state of things, finds the right files, notices the stale spreadsheet, remembers that one exception from the planning meeting, and returns with something sensible enough to edit over coffee.

This is a beautiful fantasy, partly because it has never opened a real shared drive.

Real work systems have a smell. The deck is called final_v7_real_final_use_this_one.pptx, except the comments are in the previous copy because legal reviewed the wrong attachment. The customer exception is buried in a thread that began as a joke about lunch. The dashboard is accurate if you know which two segments not to trust. The README explains how the service worked before the migration, which is not quite the same as lying, but does require literary patience.

Agents are now being pointed at this mess with increasing confidence. GitHub's Agentic Workflows public preview turns natural-language automations into repository-aware work that can run with policy and validation around it. OpenAI's Frontier frames enterprise agents as coworkers that operate across business systems with permissions, observability, and auditable actions. Research benchmarks are catching up too. Workspace-Bench treats office work less like a single prompt and more like a bundle of dependent files, calendars, mail, slides, sheets, and app state. JobBench asks whether agent work aligns with the tasks human experts would actually delegate, which is a useful improvement over pretending that every automatable task is a task anyone wanted handled that way.

The interesting signal across these examples is not that agents can do more. They can. The more revealing signal is how quickly "doing more" becomes a demand for housekeeping.

Housekeeping is an unfashionable word in software culture because it sounds like tidying, and tidying sounds small. But in real organizations, housekeeping is often the difference between a system that can be trusted and one that works only because three tired people know where the truth is hidden. It is naming things clearly, deleting dead paths, writing down the exception instead of leaving it in the memory of whoever was on the call, and marking a document as obsolete before someone builds next quarter on top of last quarter's misunderstanding.

For human teams, messy workspaces have always had a social workaround. You ask Sarah, because Sarah knows. You DM the person who made the sheet. You remember that the "new" process is actually older than the reorg and should be ignored after step four. You can hear the slight hesitation in a colleague's voice when they say a number is "directionally right," which is office dialect for "please do not put this in front of the board without adult supervision."

Agents do not have that social fabric by default. They have access, context windows, tool calls, retrieval, logs, and increasingly elaborate ways of saying they are not sure. Those are useful. They are not the same as being embedded in the small human system that keeps a messy workplace from collapsing into nonsense. When an agent finds three contradictory artifacts, it may rank, summarize, ask, or choose. A good one may pause. A bad one may produce a gorgeous answer with the wrong fossil embedded inside it.

The conversation about agent capability can become too narrow here. We keep asking whether models are smart enough to work inside our systems. The companion question is whether our systems are legible enough to share intelligence with anything that was not already socialized into their weirdness. A person can survive a bad folder structure because they learn the folklore. An agent needs the folklore made operational.

That does not mean every organization needs a grand knowledge-management program with a logo, a steering committee, and a tragic Confluence migration. Please no. Most teams have already endured enough portals named after virtues. The better starting point is smaller and more physical: what would someone need to know before taking action here, and where would they find it without interrupting the busiest person in the room?

This turns interface design into workplace maintenance. An agent-ready workspace is not only a place with APIs and permissions. It is a place where status is visible, sources have lifetimes, exceptions are named, ownership is findable, and uncertainty has somewhere respectable to stand. If those signals exist only as vibes, the agent is not really collaborating with the organization. It is excavating it.

There is a human cost to pretending otherwise. When agents work in messy systems, people inherit the cleanup in the least humane form: after the wrong draft has circulated, after the automation used the stale policy, after the support response sounded authoritative and missed the actual history. The work then arrives as blame-shaped archaeology. Someone has to explain why the agent did a reasonable thing with unreasonable context and make the mess legible under pressure, which is the worst time to do housekeeping unless you enjoy labeling boxes during a small fire.

The better version is less glamorous than the demos. It asks teams to treat information hygiene as part of the product surface for human-agent work. It asks leaders to notice that making context clear is real labor, not clerical residue. It asks vendors to build interfaces that show source age, confidence, conflicting evidence, ownership, and action history without turning every task into an audit seminar. It asks workers to be allowed time to maintain the room before being judged by what the room allows an agent to do.

Some of this will sound boring, which is usually where the important design work hides. A clean handoff is not created by a clever prompt alone. It is created by a workplace that has enough respect for its own memory to make decisions, exceptions, and sources findable before the machine arrives.

Agents may make messy systems more productive for a while. They may also make the mess move faster. The humane question is whether we are building tools that help people understand and improve the shared room, or tools that simply become better at rummaging through it. Work is not only a pile of tasks waiting to be delegated. It is an environment we inhabit together. If agents are going to live there with us, somebody has to keep the lights labeled, the exits visible, and the old maps out of the emergency drawer.