Observe 29 min

Observe the Run

Reproduce an untraceable coding-agent patch, then add Anvia observers and pipeline events.

Failure pattern

The patch reaches review, but nobody can reconstruct the run. Which files did the agent inspect? Which command failed before the patch? Which checks were skipped? Which tool output became evidence?

Reproduce the failure

const result = await codingPipeline.run(task);
console.log(result.summary);

The summary exists, but the path is gone.

Successful Anvia pattern

Attach an Anvia observer to the agent and a pipeline observer to the workflow.

import { AgentBuilder, createObserver } from "@anvia/core";

const observer = createObserver({
  startRun(args) {
    const runId = crypto.randomUUID();
    runStore.start({ runId, prompt: args.prompt, history: args.history });

    return {
      startTool(toolArgs) {
        runStore.toolStarted({
          runId,
          toolName: toolArgs.toolName,
          args: toolArgs.args,
        });

        return {
          end(endArgs) {
            runStore.toolEnded({
              runId,
              toolName: endArgs.toolName,
              result: endArgs.result,
              skipped: endArgs.skipped,
            });
          },
        };
      },
      end(endArgs) {
        runStore.end({ runId, output: endArgs.output, usage: endArgs.usage });
      },
    };
  },
});

const observedAgent = new AgentBuilder("repo-coding-agent", model)
  .observe(observer)
  .build();

Pipeline stages should also emit events:

await codingPipeline.run(task, {
  observer: {
    onEvent(event) {
      runStore.pipelineEvent(event);
    },
  },
});

Why it succeeds

The run record now captures process evidence: prompt, stages, tool calls, command results, skipped checks, and final output. Review can start from facts instead of memory.

Success check

The run record should answer:

  • what behavior was active?
  • which files were read?
  • which checks failed before patch?
  • which checks passed after patch?
  • which checks were skipped?
  • what risk remains?

Next move

Observation tells you what happened. Verification decides whether that is enough.