Leave a Clean Handoff
End each coding-agent session in a state another agent or engineer can verify, understand, and continue.
Failure pattern
The agent ends a session with a confident summary, but the repo is not restartable. Tests are failing, debug logs remain, generated files are dirty, branch state is unclear, or the next action is vague.
The next engineer starts by cleaning up instead of continuing.
Incident: messy PR handoff
Agent task
The agent is asked:
Prepare the SSO invite fix for review, or leave a clean handoff if it is not ready.
The session ends before e2e verification is complete.
Available surface
The repo state includes:
| Surface | State |
|---|---|
| Code diff | Auth guard and invite route changed |
| Tests | Unit and API pass, e2e not run |
| Debug artifacts | Console logs added during investigation |
| Generated files | API types regenerated |
| Branch | Uncommitted migration scratch file |
| Progress note | Missing |
Bad run
The agent says:
Mostly ready. Just run the final tests.
The next engineer finds debug logs, a scratch migration, no exact command list, and no explanation for why e2e did not run. They do not know whether generated files are intentional.
Why the harness failed
The session ended with unclear state.
| Handoff gap | Consequence |
|---|---|
| Test status vague | Next engineer repeats all checks |
| Debug artifacts hidden | Review sees noisy diff |
| Generated files unexplained | Reviewer questions diff |
| Scratch file present | Branch state is unsafe |
| Next action vague | Continuation starts with archaeology |
Clean handoff was not part of done.
Why it happens
Agents often optimize for the final answer, not the next session. But coding work happens in a shared repo. A session can be valuable and still incomplete. The harness should force honest state at the end.
Clean does not mean finished. It means understandable, verifiable, and restartable.
Harness principle
Clean handoff is part of the definition of done.
flowchart LR A["Session work"] --> B["Verify status"] B --> C["Clean artifacts"] C --> D["Record blockers"] D --> E["Write next action"] E --> F["Restartable repo state"]
The handoff should say what is complete, what is not, and what to do first.
Operating practice
Use an exit checklist:
| Check | Handoff state |
|---|---|
| Verification | Unit/API passed; e2e not run |
| Debug artifacts | Removed |
| Generated files | API types updated intentionally |
| Dirty files | Scratch migration removed |
| Known blockers | Preview email service unavailable |
| Next action | Run e2e or CI equivalent |
| Approval boundary | Do not mark PR ready until e2e passes |
Harnessed handoff
Status: patch_with_blocker, not review-ready.
Changed:
- auth callback waits for membership hydration
- invite route preserves onboarding redirect
- API types regenerated intentionally
Verified:
- pnpm test auth-guard.spec.ts: pass
- pnpm test invite-api.spec.ts: pass
Not verified:
- e2e invite flow; preview email service unavailable locally
Cleaned:
- removed console logs
- removed scratch migration
Next:
- run pnpm playwright invite-flow.spec.ts or rely on CI equivalent
The next session can start immediately.
Coding-agent example
State categories:
| Category | Examples |
|---|---|
| Complete | Unit/API tests passed |
| Pending | E2E verification |
| Temporary | Debug logs, scratch files |
| Broken | Known failing check |
| Unknown | Untested browser flow |
| Next | Exact command or reviewer action |
Consistent categories make handoff readable.
Review artifact
A clean coding-agent handoff is a restart document. It should be possible to open it tomorrow and know exactly what is true.
Handoff: invite acceptance redirect fix
Current state:
- Patch updates redirect after membership event creation.
- Focused invite acceptance test passes locally.
- Expired-token regression passes locally.
Not done:
- Full auth e2e suite not run.
- Analytics first-login event not verified.
- Temporary debug log removed, but generated client diff remains unrelated.
Changed files:
- src/routes/invite/[token].ts
- tests/e2e/invite-acceptance.spec.ts
Known risks:
- Auth callback timing may differ in staging.
- Workspace setup redirect depends on feature flag `new_onboarding_flow`.
Next action:
- Run full auth e2e suite in staging-like env.
- Confirm generated client diff is pre-existing or regenerate cleanly.
This handoff is honest. It does not pretend the task is fully finished, and it does not bury risk in prose. It gives the next person a first command and a reason.
The handoff should also clean the workspace where possible. Remove debug logs, temporary scripts, unused fixtures, and dead experiment files. If something cannot be cleaned safely, name it. A dirty branch is acceptable only when the dirt is explained.
For coding agents, the final state is part of the product. A brilliant patch with a confusing handoff still costs the team time. The harness should define the exit criteria as clearly as it defines the start criteria: evidence attached, scope status truthful, next action named, and no unexplained residue.
Harnessed version
The harnessed run cannot close while the branch state is ambiguous. It must classify every residue: intentional change, generated artifact, temporary file removed, known unrelated diff, or unresolved blocker. If the agent cannot classify something, that uncertainty belongs in the handoff.
This is where coding harnesses differ from ordinary documentation. The handoff is tied to the actual workspace. It should match the diff, the test evidence, and the next command. If it says “debug logs removed,” the diff should support that. If it says “full auth suite not run,” the reviewer should not need to discover that by asking later.
Clean handoff also protects future agents. The next session can begin from the named next action instead of reconstructing the previous session’s intent from partial edits. That makes long-running coding work practical without pretending that one agent session will finish everything.
The standard is simple: a fresh engineer or agent should be able to resume without asking what happened.
Common mistakes
The first mistake is writing “mostly done.” That is not a state.
The second mistake is leaving debug artifacts for reviewers to find.
The third mistake is failing to explain generated files.
The fourth mistake is hiding missing tests behind confident prose.
Practical exercise
Create a session-exit checklist for one repo. Include verification, debug artifacts, generated files, dirty state, blockers, next action, and approval boundary.
Use it after five agent sessions and track which item fails most often.
Key takeaways
- A coding session is not done if the repo is not restartable.
- Handoff should be truthful, not optimistic.
- Temporary artifacts must be removed or named.
- Missing verification should block PR readiness.
- The next action should be exact.
Further reading / source notes
- Anthropic, “Effective harnesses for long-running agents” for session handoff practices.
- Google SRE, “Managing Incidents” for operational handoff principles.