Close the Feedback Loop

Failure pattern

The same review comments return across agent patches: wrong test command, missed edge case, broad refactor, unsafe migration, stale docs. Humans fix the PR, but the harness does not improve.

A repeated review comment is harness data. It says the environment made the wrong behavior easy.

Incident: repeated migration review failure

Agent task

Across several issues, the coding agent adds database columns for workspace features.

Review repeatedly says:

Migration is unsafe for existing tenants. Add a backfill plan and avoid locking writes.

Available surface

The repo has:

Surface	Contents
Migration examples	Mixed safe and unsafe patterns
DB docs	Backfill guidance, partially outdated
Review comments	Several corrected PRs
CI	Schema checks but no migration safety check
Runbook	Production deploy rules

The agent keeps copying a simple migration pattern.

Bad run

It creates:

ALTER TABLE workspace ADD COLUMN require_sso BOOLEAN NOT NULL DEFAULT false;

Review flags table-lock risk and missing backfill plan again.

Why the harness failed

The failure was corrected but not converted into a harness fix.

Repeated failure	Harness layer
Unsafe migration copied	Context examples weak
Backfill missing	Completion gate incomplete
Review comments ignored	Feedback not persisted
CI passed	Automated checks missing
Same issue repeated	No regression case

The team patched the output, not the system.

Why it happens

Coding agents learn from the context and feedback they can see. If prior review comments live only in PR threads, the next run may not see them. If unsafe examples remain near safe examples, the agent may copy the shorter pattern.

Closing the loop means turning human correction into future harness behavior.

Harness principle

Every repeated failure should produce a small harness change and a comparable rerun.

flowchart LR
  A["Failed PR"] --> B["Attribute layer"]
  B --> C["Small harness fix"]
  C --> D["Comparable coding task"]
  D --> E{"Failure prevented?"}
  E -->|"Yes"| F["Keep regression"]
  E -->|"No"| B

Review comments become harness fixes and regression checks.

The goal is not to write a longer prompt. The goal is to make the same mistake harder to repeat.

Operating practice

Use a failure log:

Failure	Layer	Harness fix	Evidence
Unsafe `NOT NULL DEFAULT` migration	Context	Mark unsafe examples as legacy; add safe migration guide	Next migration uses expand/backfill/contract
Backfill plan missing	Completion gate	Add migration checklist to PR readiness	Agent outputs backfill step
CI misses migration risk	Verification	Add lint/check or reviewer checklist item	PR blocked before review

Then add a regression case:

Task: add workspace boolean setting.
Expected:
- nullable column or expand/backfill/contract plan
- no long write lock
- rollback notes
- verification command

Coding-agent example

Failure attribution rubric:

Question	Likely layer
Was task vague?	Work surface
Did agent copy stale pattern?	Context
Did command/tool invite risk?	Interface
Was baseline unknown?	Runway
Did patch sprawl?	Active work
Did state vanish?	Progress
Did weak evidence pass?	Judging
Could path not be reconstructed?	Instrumentation
Was final state unclear?	Handoff

Review artifact

Feedback should become a harness change, not a repeated comment.

Review comment	Attribution	Harness change
”Migration locks the table.”	Interface	Add migration planner output before apply
”Backfill has no rollback.”	Work surface	Require rollback note for data changes
”Test only covers happy path.”	Verification	Add negative-path regression template
”Agent changed unrelated cleanup.”	Active work	Require queued/rejected state table
”Reviewer cannot reproduce result.”	Instrumentation	Store run command and fixture version

This table turns review pain into system improvement. If the same comment appears three times, the harness is failing. The answer is rarely “try harder.” The answer is usually a sharper task brief, safer tool surface, better context routing, stronger evaluation gate, or better run record.

The loop should be small:

flowchart LR
  A["Observed failure"] --> B["Attribute to harness layer"]
  B --> C["Change one harness rule"]
  C --> D["Rerun comparable case"]
  D --> E["Keep, adjust, or remove rule"]

The comparable case is important. If the agent failed on a migration because it ignored lock risk, rerun another migration-like task after adding the planner requirement. Do not wait for a future production incident to learn whether the harness improved.

For coding agents, feedback loops should live in durable assets: task templates, command wrappers, review checklists, eval cases, and examples. A reviewer comment inside one PR helps that PR. A changed harness helps the next ten PRs.

Harnessed version

The harnessed run treats the repeated migration issue as a system signal. The team adds a migration-planning requirement, creates two regression prompts that resemble previous bad migrations, and reruns them after the rule changes. If the agent now produces rollback notes and lock-risk evidence, the harness improved. If it still misses the risk, the fix was in the wrong layer.

This is the core of feedback-loop work: separate the agent’s one-time mistake from the harness weakness that allowed it. A stronger prompt might help for one task. A better command contract, review gate, or eval case changes future behavior.

The loop should stay intentionally small. Do not rewrite the entire harness because one patch had one review comment. Attribute the failure, change one thing, rerun a comparable case, and keep the change only if it improves the result.

That discipline keeps the harness learnable. Engineers can understand why a new rule exists because it points back to a real failure.

Common mistakes

The first mistake is repeating review comments without changing the harness.

The second mistake is fixing everything at once. Small fixes are easier to evaluate.

The third mistake is adding only negative instructions. Better to add safe examples and gates.

The fourth mistake is letting corrected PRs disappear instead of becoming examples.

Practical exercise

Review five agent PR comments. Group repeated comments by harness layer. Pick one repeated failure and design the smallest harness change that would have prevented it.

Then test the change on a comparable task.

Key takeaways

Repeated review comments are harness data.
Fixing a PR is not the same as fixing the harness.
Safe examples beat vague warnings.
Regression cases keep old failures from returning.
Feedback loops should be small and testable.