Field Notes

Instructions Became Infrastructure

2026-05-265 min readAIWorkInfrastructure

As agents begin to run from Markdown files, skills, goals, and workflow rules, the quality of ordinary instructions starts to matter like software architecture.

The strangest new programming language may be office prose.

Not code exactly. Not the old fantasy that everyone becomes a software engineer by speaking naturally into a machine. Something quieter and more consequential is happening: instructions are becoming operational.

A Markdown file can define an agentic workflow. A skill can package steps, scripts, and resources so an agent knows how to perform a specialized task. An SDK can expect explicit instructions, tool boundaries, files, manifests, and sandboxes as part of the normal agent loop. A research paper from Microsoft can argue that the ceiling on human-AI collaboration depends less on raw model capability than on the clarity of the goals we can encode and carry across tools.

This is not prompt engineering with a better title.

Prompt engineering treated language as a way to coax a response from a model. Instructions as infrastructure treats language as part of the system that decides what work happens, what tools are reachable, what counts as completion, and which human must remain close.

That shift deserves more attention than it is getting.

GitHub's Agentic Workflows are a useful signal because they look almost too ordinary. You define behaviors in Markdown, compile them, and run them as GitHub Actions. The examples are familiar: triage issues, investigate failed CI, maintain documentation, improve test coverage, monitor compliance. None of those tasks are science fiction. They are the damp hallways of organizational life.

But the medium matters.

When a workflow is written in plain language and frontmatter, it becomes easier for more people to author automation. That is good. It also means the quality of procedural writing starts carrying more operational risk. Ambiguous instructions are no longer merely confusing. They may become repeatable confusion. A missing constraint is no longer a gap someone notices in a meeting. It may become an automated action pattern.

The old software boundary was rough but visible. Code was code. Documentation was documentation. A runbook might be wrong, but a human still had to read it, interpret it, and perform the next step badly enough to expose the mismatch.

Agents blur that boundary.

Now a runbook can become an action surface. A checklist can become a delegation contract. A style guide can become a behavior policy. A paragraph in a repository can influence how an autonomous tool interprets a pull request, files an issue, posts a comment, or decides that a task is done.

This is a little uncanny because most organizations are not very good at writing instructions.

They are good at accumulating them.

There is a difference.

Most internal process language survives because humans are flexible. People learn which parts are outdated, which steps are ceremonial, which approvals are real, which links are dead, and which sentence means "ask Dana before you touch production." The document looks official, but the actual system lives in the correction layer around it.

Agents do not inherit that correction layer unless we build it.

This is why "safe outputs" in GitHub's documentation feel more important than the feature name suggests. The idea is not just that an agent can produce text and then a workflow can create issues, comments, pull requests, or labels. The important part is the separation: the agentic portion does not need direct write permission for the final operation. The generated output passes through validated shapes.

That is a design lesson hiding inside a security feature.

If instructions are becoming infrastructure, then prose needs typed edges. It needs permission boundaries, output schemas, review points, source constraints, and undo paths. It needs to say not only what should happen, but what must never happen automatically. It needs to make uncertainty visible before uncertainty becomes a side effect in the repository.

This is also where recent security research on agentic workflow injection stops being a niche concern. If untrusted issue text, pull request descriptions, or comments can cross into an agent prompt and then into downstream workflow logic, the problem is not just malicious input. It is the system failing to understand where language changes category.

Some language is evidence.

Some language is instruction.

Some language is an artifact to be transformed.

Some language is a request from an untrusted stranger wearing the costume of work.

Humans make these distinctions socially and imperfectly all day. Agentic systems need the distinctions represented explicitly. Otherwise every text field becomes a potential hallway between intention and execution.

The management consequence is just as important as the technical one. As more work becomes delegable through written workflows, the people who can write clearly will become infrastructure workers whether or not their titles admit it. The person who names the goal well, describes the boundary, includes the exception, and defines the review surface is no longer doing "soft" work around the real system. They are shaping the system.

That should change how teams value writing.

Not branding writing. Not executive performance writing. Operational writing.

The kind that says what good looks like without pretending the world is cleaner than it is. The kind that distinguishes a task from an outcome. The kind that knows when the agent should act, when it should ask, and when the correct answer is to stop because the situation has become too human, too expensive, or too ambiguous for a silent workflow.

There is a temptation to treat natural-language automation as democratization by default. More people can describe what they want, so more people can build. That may be partly true. But access without discipline can also spread institutional vagueness faster than code ever did. A bad process translated into Markdown is still a bad process. It is just easier to schedule.

The better version is more demanding.

It asks organizations to take language seriously as a material. To maintain instructions the way they maintain dependencies. To review workflows not only for whether they run, but for whether they encode judgment worth repeating. To treat goals, skills, prompts, policies, and runbooks as living interfaces between human intention and machine action.

That is less glamorous than autonomous agents solving work end to end.

It is also closer to where the responsibility actually lives.

The future of AI work may not be decided only by the biggest model or the most cinematic demo. It may be decided inside thousands of small instruction files, where a team writes down what it thinks it means and discovers whether the system believes it.