Runtime flight recorder for agent work

Git is a strong record of source changes. It is a weak record of what happened inside an agent run.

That gap becomes visible once agents stop being occasional assistants and start becoming part of the delivery loop. An agent can inspect files, run commands, reject a path, hit an environment issue, change runtime state, update local instructions, or learn that the proposed mechanism was wrong. Not all of that belongs in a commit. But if none of it is recoverable, the next operator has to reconstruct the work from memory.

The missing layer is a runtime flight recorder.

I do not mean recording every token or surveilling every message. For most small agent systems, that would be noisy and unnecessary. The useful cadence is periodic or on demand: once a day, a few times a week, or when the system starts behaving strangely.

The question is simple: what did the agent accumulate in its runtime, and what changed since the last known-good checkpoint?

Not Memory As Git

Git should remain the source of truth for code. It should not become a dumping ground for every private transcript, database row, cache file, or generated timestamp.

The pattern I like is closer to a mirror of the black box.

The runtime keeps doing its work. A small checkpoint layer captures the meaningful state changes that are safe and useful to inspect. Anti-noise logic skips timestamp-only churn and generated heartbeat changes that would make history unreadable. A fanout or mirror gives the operator a convenient read surface. Sparse tags or checkpoints mark useful recovery points without turning every runtime twitch into ceremony.

The goal is not to build an observability platform. The goal is to answer practical questions:

what changed during this period?
which changes look meaningful?
which changes are only timestamp or heartbeat noise?
what was the last normal checkpoint?
what can be reverted, excluded, or handed to the next agent as context?
did the runtime state safely reach the read surface?

That is enough to have a better conversation with the system.

Without this layer, you are often talking to the current model response. The current response may not explain which state the runtime accumulated earlier, which files changed, or why a rule now behaves differently.

What The Recorder Should Capture

A useful runtime record should keep enough evidence to answer:

what task, prompt, or schedule started the run
what files, commands, or runtime surfaces mattered
what changed in source, config, state, or instructions
what verification passed or failed
what was intentionally ignored
where the next operator or agent should re-enter

The record should be compact, path-oriented, and safe to reference from a handoff. It should avoid secrets, raw private transcripts, and broad dumps of runtime databases unless there is a deliberate storage policy for them.

Some runtime files may belong in Git LFS. Some should be ignored. Some should be summarized. Some should never leave the machine. The important thing is making that decision explicit instead of letting a broad git add decide for you.

Anti-Noise Matters

Noisy commits are not observability. They are a new failure mode.

If a checkpoint creates a commit every hour because updated_at changed, the history stops being a signal. Operators stop reading it. Agents start treating noise as state. Real changes hide between mechanical churn.

Anti-noise should be value-aware. It should skip timestamp-only or generated-heartbeat-only diffs, but it should also leave a receipt that says what was skipped and why. A blind ignore rule can hide meaningful state changes. A receipt-producing filter keeps the system honest.

This is a small detail, but it decides whether the flight recorder becomes useful or becomes another stream nobody trusts.

Runtime Observability Is Work Observability

Traditional observability tells us what servers and services are doing. Agentic workflows need a neighboring question: what did the work process do?

Did the agent change the rules it follows? Did it update a local state database? Did it preserve a failed hypothesis somewhere? Did it fast-forward a mirror or leave it blocked? Did it avoid a destructive sync because a peer was dirty? Did it run a real verification loop or only write a final message?

Those facts may not belong in product code, but they matter for trust.

Git remains the source of truth for the code. The runtime flight recorder explains the work around the code: the attempts, checks, state changes, skip decisions, and boundaries that made a commit trustworthy or showed why no commit was made.

That distinction is practical. When agents become part of delivery, the operator needs more than the final answer. They need an inspectable trail of what happened between the task and the claim.

For small agent systems, a boring checkpoint mirror may be enough. Less magic, fewer false “it probably worked” moments, and a clearer path back into the black box when something starts to drift.

Runtime flight recorder for agent work

Not Memory As Git

What The Recorder Should Capture

Anti-Noise Matters

Runtime Observability Is Work Observability

Keep reading before switching into hiring mode.

Related Posts

The next agent wave is about control planes

GitOps does not end at push

Structured outputs are the API boundary of LLM products

AI code review needs verification loops