Limit Active Work
Prevent scope spread by forcing the coding agent to finish one behavior before starting the next.
Failure pattern
A single issue causes the agent to open UI, API, database, test, and refactor work at the same time.
The patch looks active. Many files change. But no behavior is clearly complete, and every review comment exposes another half-finished thread.
Incident: workspace settings sprawl
Agent task
The issue says:
Add required-SSO setting to workspace invitations.
This is a product behavior, but it can touch many layers.
Available surface
The agent can change:
| Surface | Possible work |
|---|---|
| Settings UI | Checkbox and copy |
| API | Validation and persistence |
| Database | New workspace setting |
| Invite flow | Enforce SSO requirement |
| Tests | Unit, integration, e2e |
| Refactor | Shared auth helper cleanup |
No work-in-progress limit exists.
Bad run
The agent changes all layers plus a helper refactor:
- adds database column
- changes settings UI
- rewrites invite acceptance guard
- refactors auth helpers
- updates email copy
- adds one integration test
The e2e flow fails because the UI writes the setting but the invite guard reads a different field. The refactor also breaks password-login tests. Review cannot isolate the intended behavior.
Why the harness failed
The harness allowed too many active threads.
| Missing control | Consequence |
|---|---|
| One active behavior | UI, API, DB, auth, and copy changed together |
| Verification per behavior | No layer was proven independently |
| Deferred list | Refactor was folded into feature work |
| Blocked state | Migration decision was guessed |
| Patch boundary | Email copy changed without need |
The agent did not protect itself from scope spread.
Why it happens
Coding agents follow dependency chains. A feature needs a field, then an API, then UI, then tests, then helpers. The chain is real, but that does not mean every possible improvement belongs in one active run.
Humans use small pull requests, feature flags, and review boundaries. A harness should create similar discipline.
Harness principle
One active behavior at a time.
stateDiagram-v2 [*] --> Queued Queued --> Active: selected behavior Active --> Verified: evidence passes Active --> Blocked: decision needed Blocked --> Active: unblocked Verified --> [*]
Large features can still ship. They move through a queue.
Operating practice
Split the issue:
| Item | State | Evidence |
|---|---|---|
| Persist required-SSO setting | active | API test stores and reads setting |
| Enforce SSO during invite acceptance | queued | Invite flow e2e blocks password signup |
| Add settings UI | queued | UI toggles persisted field |
| Refactor auth helper | deferred | Behavior-preserving tests pass |
| Email copy | blocked | Product copy approval |
The active item permits only API, schema, and focused tests. The agent may note follow-ups, but not implement them.
Harnessed run
Active behavior: persist required-SSO setting.
Changed: schema, settings API, API tests.
Not changed: invite acceptance, UI, email copy, auth helper refactor.
Verification: pnpm test workspace-settings-api.
Next queued item: enforce setting during invite acceptance.
The patch is smaller and reviewable.
Coding-agent example
Behavior-sized coding items:
| Bad item | Better item |
|---|---|
| Add SSO setting | Persist setting through API |
| Fix invite flow | Block non-SSO acceptance when flag true |
| Update UI | Toggle persisted setting |
| Clean auth | Refactor helper without behavior change |
Each item should have one evidence condition.
Review artifact
Active work is easiest to control when every candidate change has a state.
| Behavior | State | Evidence |
|---|---|---|
| SSO setting persists after save | Active | Unit test for settings update |
| Invite copy mentions SSO | Queued | Product copy approval needed |
| Trial state shown during onboarding | Queued | Separate billing fixture needed |
| Dashboard card reflects setup status | Blocked | Depends on checklist model decision |
| Settings page refactor | Rejected | No user-visible behavior in current task |
This table is not project management decoration. It is the harness that prevents an agent from opening five fronts at once. A coding agent can hold many files in context and make fast edits, so it often creates parallel half-fixes. The active-work table tells it which behavior is allowed to consume edits now.
The implementation prompt can enforce the same rule:
Only one behavior may be Active.
If you discover another needed behavior, add it to Queued or Blocked.
Do not edit files for Queued or Blocked behaviors.
Completion requires evidence for the Active behavior only.
This protects review quality. A small patch that proves one behavior is easier to approve than a sweeping patch that might improve the product but cannot be reasoned about. It also protects the agent from its own momentum. When the model sees related code, it wants to clean it up. The harness should treat that impulse as a candidate item, not permission.
In the workspace settings incident, a harnessed agent would finish the SSO persistence bug first. It might leave a note that dashboard copy is stale, but it would not combine that copy change with persistence logic. The team gets a verifiable fix instead of a broad “activation improvement” bundle.
Harnessed version
The harnessed run converts “improve the strategy” or “improve onboarding” into a queue. The first active item becomes: “SSO setting persists after save and reload.” That item owns the patch, tests, and evidence. Everything else is recorded but untouched.
This does not slow the team down. It reduces recovery cost. If the SSO persistence patch is wrong, the rollback is obvious. If it is right, the next queued item starts from a known state. The agent still sees the larger product opportunity, but the harness controls how much unfinished work can exist at once.
Common mistakes
The first mistake is splitting by layer only. “Do database work” is not a behavior unless it has observable evidence.
The second mistake is treating refactors as free. Refactors need their own verification.
The third mistake is letting blocked decisions become guesses.
The fourth mistake is keeping a patch large because “the files are related.” Related is not the same as active.
Practical exercise
Take one feature issue and split it into five behavior-sized items. For each item, write allowed files, excluded files, evidence, and current state.
Then choose one active item and forbid all other changes.
Key takeaways
- Coding agents need work-in-progress limits.
- One active behavior makes patches reviewable.
- Related cleanup should be queued or deferred.
- Blocked work should not become guessed work.
- Evidence should attach to each behavior.
Further reading / source notes
- Anthropic, “Effective harnesses for long-running agents” for task tracking and bounded work.
- OpenAI, “Harness engineering” for specifying intent and feedback.