Field Notes

AI Metrics Are Management Design

2026-05-156 min readAIWorkQuality

As AI work becomes metered by team, model, feature, and cost, the question is no longer whether people use the tools. It is what the organization teaches them to value.

AI adoption is getting dashboards.

That sounds minor until you notice what dashboards usually become inside organizations. They begin as visibility. They become language. Eventually, if nobody is careful, they become the weather.

GitHub is making Copilot usage measurable at the team level. Administrators can connect daily user-team reports with per-user usage reports and see active users, completions, chats, languages, IDEs, features, and models. Copilot itself is moving toward usage-based billing, where input tokens, output tokens, and cached tokens turn into AI Credits. Code review will also consume GitHub Actions minutes when it runs on private repositories.

This is not just a pricing update.

It is the arrival of an accounting layer for AI work.

For a while, most workplace AI adoption was measured through softer signals. Who seemed excited? Who was trying prompts? Which team had a demo? Which manager had a slide about transformation? The data was often too vague to be useful and too optimistic to be trusted.

Now the tools are becoming legible in a different way. Usage can be counted. Cost can be modeled. Agent activity can be attached to teams. Model choice can be treated as a budget event. The invisible helper in the corner of the workflow is becoming a line item, a report, a policy surface.

Some of this is necessary.

Organizations cannot responsibly run expensive, agentic systems without knowing where the work is happening. If a model is burning through tokens on low-value tasks, someone should see that. If a team is getting real leverage from code review agents, research assistants, or internal workflow automation, that should be visible too. If a tool is quietly shifting labor from one role to another, leaders need more than anecdotes.

But measurement is never neutral once it enters management.

The danger is that AI metrics will repeat the old workplace mistake: treating activity as a proxy for value because activity is easier to count. Active users. Chats. Completions. Agent sessions. Credits consumed. Reviews triggered. These are useful operational signals. They are not the same thing as better work.

A team can use AI constantly and become more confused.

A team can use it lightly and produce excellent judgment.

A developer can consume fewer credits because they framed the task well, constrained the agent, reviewed carefully, and stopped when the change was good enough. Another can consume more because the work was messy, the context was poor, and the agent kept wandering through the repository like a very expensive flashlight.

The first person may look less engaged. The second may look like a power user.

That is the problem with visibility when it arrives before taste.

Microsoft's latest Work Trend Index lands near the same tension from another direction. The report argues that AI impact depends heavily on organizational readiness: culture, manager support, clear rules, documented handoffs, and quality standards for AI-assisted work. That is a more mature frame than simply asking whether individuals are using the tools.

It suggests the real unit of progress is not adoption.

It is design.

How does the organization decide which work deserves AI assistance? What should be delegated, what should be collaborated on, and what should remain slow because slowness is part of the quality? Who owns the review surface? What happens when an agent saves two hours but creates thirty minutes of invisible verification work for someone else? Which costs belong to a team budget, and which costs are really signs of product strategy, engineering debt, or managerial impatience?

These are not questions a usage graph can answer by itself.

They are questions a graph can provoke if the organization is honest enough.

The humane version of AI measurement would treat metrics as instruments, not incentives. It would ask where people are getting leverage, where they are getting flooded, where costs are rising because the work is poorly shaped, and where the tool is hiding labor rather than reducing it. It would pair usage with quality review. It would notice whether junior people are learning or merely becoming prompt operators inside someone else's process. It would care whether the work feels clearer on the other side.

The less humane version is easy to imagine because we have already built it many times.

Teams are compared by adoption rate. Managers ask why one group uses fewer credits. People learn to perform AI enthusiasm because the organization has made enthusiasm measurable. Quiet judgment becomes suspicious. Restraint looks like resistance. A tool meant to reduce busywork becomes another way to audit busyness.

That would be a depressing but very familiar outcome.

The deeper issue is that AI makes workplace theory harder to fake. If an organization already believes value is visible motion, it will use AI metrics to intensify motion. If it believes people are costs to be optimized, it will read every dashboard as a labor compression opportunity. If it believes quality depends on conditions, it may use the same data to redesign work more carefully.

The metric does not decide. The culture does.

This is why the next phase of AI leadership will need more moral patience than dashboard appetite. The tools are becoming measurable at exactly the moment they are becoming harder to evaluate. A completion is easy to count. A better decision is not. A token has a price. Trust has a cost too, but it rarely arrives as a neat column in the export.

So the question is not whether we should measure AI work.

We should.

The question is what kind of behavior the measurement invites. Whether it helps teams see the system more clearly, or merely gives management a sharper instrument for the same old reflexes. Whether it protects judgment, or converts judgment into utilization theater. Whether it reveals quality, or buries quality under proof that everyone is very busy with the future.

Every new metric is also a small theory of what matters.

AI is making that theory visible.

Now organizations have to decide whether they can survive being seen by their own dashboards.