Field Notes
AI Needs A Nutrition Label
As AI-generated work moves from answers into actions, people need a small, visible way to inspect what the work depends on before they trust it.
I keep thinking AI-generated work needs something closer to a nutrition label.
Not a giant audit report.
Not a single trust score.
Something lightweight enough to appear where the work happens, but structured enough to tell us what we are actually holding.
What was the AI asked to do? What sources did it use? Were those sources current? Did a human review it? Did it only suggest something, or did it take action? What assumptions are baked in? When does the answer become stale?
The point is not to make every AI interaction feel like compliance paperwork.
The point is to make AI work inspectable.
Because as these systems move from answering questions to drafting customer comms, editing code, changing workflows, and shaping decisions, the transcript is not enough.
A transcript tells us what happened.
It does not always tell us what the work depends on.
This is not a new instinct. Model Cards were proposed as short documents that disclose intended uses, evaluation details, and performance characteristics for trained models. The Dataset Nutrition Label made the food-label metaphor even more explicit: a distilled overview of a dataset's ingredients, quality signals, and warnings before that data becomes model behavior. IBM's AI FactSheets treated AI services more like products with declarations of purpose, performance, safety, security, and provenance.
Those are all useful ancestors.
But they mostly live upstream.
They explain the model, the dataset, the service, or the release. They tell us something about the machinery. What we increasingly need is a label attached to the actual piece of work sitting in front of a person at the moment of use.
That distinction matters.
If a model card says the system performs well on a benchmark, that does not tell me whether this particular answer relied on stale policy text. If a dataset label explains the training corpus, that does not tell me whether the agent used an internal document, a web search result, a user memory, or a guess. If a product disclosure says an AI service has safety processes, that does not tell me whether this draft was reviewed by a human before it went to a customer.
The output has its own supply chain.
Content provenance work is moving in this direction for media. The C2PA describes Content Credentials as an open standard for establishing the origin and edits of digital content, and even says they can function like a nutrition label for digital content. That is an important pattern: make the history of an artifact visible enough that people can evaluate it.
But AI work products are broader than media files.
A support response, code patch, compensation recommendation, contract summary, slide deck, search brief, policy draft, or automated workflow can all be AI-shaped without being a synthetic image or video. The trust question is less "was this generated?" and more "what kind of dependency am I being asked to inherit?"
That is where the label becomes interesting.
Not as a confession that AI was involved. That will be too blunt. A spellcheck pass, a generated first draft, a retrieved legal summary, and an agent that changed a customer's account are not the same kind of thing. Labeling them all "AI-generated" is like labeling every food "processed" and calling the job done.
A useful label would separate the dimensions people actually need for judgment.
Task: what was the system asked to do?
Inputs: what sources, memories, files, tools, or databases shaped the work?
Currency: when were the important sources last checked?
Mode: did the AI suggest, draft, decide, or act?
Review: who looked at it, and what were they responsible for checking?
Assumptions: what did the system treat as true without proof?
Omissions: what relevant context was missing?
Staleness: when should this answer stop being trusted without another pass?
Cost of error: what happens if this is wrong?
That last field may be the most humane one.
People do not need the same label for every output. A brainstorm can carry a light label. A customer email needs more. A code change that touches authentication needs more still. A medical, financial, legal, or employment decision should not get to hide behind the calming texture of fluent prose.
The label should become heavier as the consequence becomes heavier.
This is where the interface has to be careful. Too much metadata will make people ignore it. Too little will make it ornamental. The right pattern probably has layers: a small visible summary in ordinary use, an expandable version during review, and a durable record when the work becomes part of a system of record.
Think of it less as an audit report and more as a handle.
A handle is not the whole object. It is the part that lets a person pick the object up safely.
That is what AI-generated work is missing. We have answers, drafts, summaries, diffs, recommendations, plans, and actions. We have transcripts. We have logs. We have dashboards. But we do not yet have a widely understood surface that says, in human terms, here is what this work is made of and here is how carefully you should hold it.
Partnership on AI's deployment guidance uses the phrase "key ingredient list" for public reporting about foundation models: compute, parameters, architecture, training approach, documentation, capabilities, limitations, testing, and risks. The phrase is right. The next step is making ingredient lists local and situational.
Not only what went into the model.
What went into this answer.
That shift would also improve human review. Right now, reviewing AI work often means rereading the polished output and wondering where to apply pressure. A label could tell the reviewer where the weak joints are: the source is old, the system inferred the policy, the customer-specific contract was not available, the action was drafted but not sent, the answer depends on a regulation that changes often.
The review becomes less like proofreading and more like inspection.
It also makes disagreement easier. If a person can see that an answer used public docs but missed internal exceptions, the conversation becomes concrete. If they can see that the system acted rather than suggested, responsibility becomes concrete. If they can see that a source was last checked nine months ago, doubt becomes concrete.
Concrete doubt is useful.
Vague doubt just poisons the room.
The nutrition label metaphor works because it does not ask people to become food scientists before lunch. It gives them enough structure to make a better decision than packaging alone would allow. Calories are not the whole meal. Ingredients are not the whole diet. Serving size is not morality. But the label creates a common place to look.
AI needs that common place.
Especially because the work is getting smoother. The better the output sounds, the easier it is to forget that it may be carrying stale sources, hidden assumptions, missing context, tool effects, policy gaps, or low-confidence inferences. Fluency is packaging. A label is the beginning of inspection.
So maybe the next interface pattern is not more chat.
Maybe it is a label.
Small enough to travel with the work.
Structured enough to make trust discussable.
Plain enough that a busy person can understand what they are holding before they pass it along.