Execute 39 min

Persist Progress

Make long-running coding work restartable without relying on chat history.

Failure pattern

A bug investigation spans multiple sessions, but the findings live only in the conversation. The next agent repeats work, trusts a stale summary, or contradicts the prior diagnosis.

Coding work often pauses at the worst time: after reproducing a bug, before the fix is known. Without durable progress, a continuation starts from memory instead of evidence.

Incident: intermittent invite failure

Agent task

The agent investigates:

Some invited users get a 403 on first login. Find the cause and prepare a fix.

The issue is intermittent and tied to workspace membership hydration.

Available surface

The agent can inspect:

SurfaceContents
Issue threadUser reports and timestamps
Logsauth callback, invite acceptance, membership creation
TestsPlaywright invite flow and API tests
Databaseinvitations, memberships, sessions
Recent PRsauth callback refactor and queue change
Local notesscratch findings from prior run

The first session reproduces the failure but stops before fixing it.

Bad run

The first agent writes in chat:

Looks like a race between invite acceptance and session creation. Need to inspect membership hydration.

The next agent starts fresh, sees a 403, and blames permission middleware. It edits the middleware and makes the bug less frequent but not fixed.

The prior reproduction details are gone.

Why the harness failed

Progress was not persisted as repo state.

Missing recordConsequence
Reproduction stepsSecond agent used a different scenario
Verified findingsRace condition evidence was lost
Rejected hypothesesMiddleware theory was reopened
Open blockerQueue timing was not tracked
Next actionContinuation chose a different path

The conversation had clues. The repo did not.

Why it happens

Long bug hunts create partial knowledge. Some hypotheses are verified, some rejected, and some open. A chat summary compresses that state into vague language. A fresh agent cannot tell what is trustworthy.

Coding harnesses need restart artifacts: short, current, and operational.

Harness principle

Persist progress outside chat.

flowchart LR
  A["Session starts"] --> B["Read progress note"]
  B --> C["Continue active hypothesis"]
  C --> D["Verify or reject"]
  D --> E["Update progress note"]
  E --> F["Next session resumes"]
Durable progress lets a new coding session continue from verified state.

Progress records should distinguish verified findings from guesses.

Operating practice

Write a restart note:

FieldExample
ObjectiveFix intermittent 403 after invite acceptance
ReproductionCreate invite, accept email, first auth callback returns 403 in 3/10 runs
VerifiedMembership row sometimes created after session permission check
RejectedStatic role middleware is not the primary cause
OpenQueue job timing after auth callback needs inspection
EvidenceLog trace INV-403-2026-05-18-A, failing test branch
Next actionAdd focused test around session wait for membership hydration

Harnessed run

The next agent reads the note and returns:

Continuation:
- Reused reproduction from INV-403-2026-05-18-A.
- Confirmed middleware hypothesis was already rejected.
- Added failing test for session before membership hydration.
Status:
- Root cause verified.
- Fix not implemented yet.

The second run advances instead of restarting.

Coding-agent example

Use state labels:

StateMeaning
verifiedConfirmed with evidence
rejectedTested and ruled out
openKnown question
blockedWaiting on access or decision
staleNeeds rerun after code changes

These labels prevent “looked into” from meaning five different things.

Review artifact

A durable progress record should survive a new session, a new agent, or a reviewer joining late.

Research note: intermittent 403 after invite acceptance

Verified:
- Token is valid when acceptance request begins.
- Membership row exists after acceptance mutation.
- 403 happens on first workspace route load, not during token exchange.

Rejected:
- Expired-token branch is not involved.
- Email link encoding is not the cause.
- Browser cache does not reproduce the issue.

Unverified:
- Session refresh may happen before membership event propagation completes.
- Workspace role cache may be stale after acceptance.

Evidence:
- run-2026-05-18-a: failing e2e trace
- run-2026-05-18-b: membership row inspection
- log-acceptance-17: session refresh timing

Next action:
- Add focused trace around role cache read after acceptance redirect.

This note is more valuable than a chat summary because it separates verified findings from guesses. The next agent can continue from the open question instead of re-running the first hour of investigation.

Persistence also changes how contradiction is handled. If the next agent believes the role cache is not stale, it should add evidence under Rejected, not overwrite the previous note with a new story. The harness should make knowledge additive and auditable.

For longer coding tasks, store this record close to the work: in a run artifact, issue comment, PR body, or structured task file. Chat history is not enough because it is hard to search, hard to diff, and easy to lose when the session changes. A coding harness should assume restarts are normal.

Harnessed version

In the harnessed run, the first agent does not end with “I think it is session timing.” It ends with a progress record that names the reproduction, verified facts, rejected hypotheses, and next command. The second agent starts by reading that record, then either continues the next action or challenges a finding with new evidence.

That changes the team dynamic. A restart no longer means the investigation returns to zero. A contradiction no longer becomes two competing stories in chat. The durable record becomes the shared memory of the task, and every new claim must attach itself to evidence.

For coding work, this is especially valuable when the bug is intermittent. The agent may need multiple test traces, logs, or fixture variations before the cause is clear. Persisting progress keeps those attempts useful even when the final fix has not been found.

Common mistakes

The first mistake is saving only a final summary. Bug hunts need reproduction and rejected hypotheses.

The second mistake is hiding the progress note somewhere the next agent will not read.

The third mistake is failing to mark stale findings after code changes.

The fourth mistake is omitting command output or trace IDs.

Practical exercise

Take one unfinished bug investigation. Write a restart note with objective, reproduction, verified facts, rejected hypotheses, open questions, evidence, and next action.

Start a fresh session from only that note. If it repeats work, the note is not operational enough.

Key takeaways

  • Chat history is not durable progress.
  • Bug investigations need restart notes.
  • Rejected hypotheses are as important as verified ones.
  • Evidence IDs make findings inspectable.
  • The next action should be explicit.

Further reading / source notes