Limit Active Work

Failure pattern

A single issue causes the agent to open UI, API, database, test, and refactor work at the same time.

The patch looks active. Many files change. But no behavior is clearly complete, and every review comment exposes another half-finished thread.

Incident: workspace settings sprawl

Agent task

The issue says:

Add required-SSO setting to workspace invitations.

This is a product behavior, but it can touch many layers.

Available surface

The agent can change:

Surface	Possible work
Settings UI	Checkbox and copy
API	Validation and persistence
Database	New workspace setting
Invite flow	Enforce SSO requirement
Tests	Unit, integration, e2e
Refactor	Shared auth helper cleanup

No work-in-progress limit exists.

Bad run

The agent changes all layers plus a helper refactor:

- adds database column
- changes settings UI
- rewrites invite acceptance guard
- refactors auth helpers
- updates email copy
- adds one integration test

The e2e flow fails because the UI writes the setting but the invite guard reads a different field. The refactor also breaks password-login tests. Review cannot isolate the intended behavior.

Why the harness failed

The harness allowed too many active threads.

Missing control	Consequence
One active behavior	UI, API, DB, auth, and copy changed together
Verification per behavior	No layer was proven independently
Deferred list	Refactor was folded into feature work
Blocked state	Migration decision was guessed
Patch boundary	Email copy changed without need

The agent did not protect itself from scope spread.

Why it happens

Coding agents follow dependency chains. A feature needs a field, then an API, then UI, then tests, then helpers. The chain is real, but that does not mean every possible improvement belongs in one active run.

Humans use small pull requests, feature flags, and review boundaries. A harness should create similar discipline.

Harness principle

One active behavior at a time.

stateDiagram-v2
  [*] --> Queued
  Queued --> Active: selected behavior
  Active --> Verified: evidence passes
  Active --> Blocked: decision needed
  Blocked --> Active: unblocked
  Verified --> [*]

A work-in-progress limit keeps code changes reviewable.

Large features can still ship. They move through a queue.

Operating practice

Split the issue:

Item	State	Evidence
Persist required-SSO setting	active	API test stores and reads setting
Enforce SSO during invite acceptance	queued	Invite flow e2e blocks password signup
Add settings UI	queued	UI toggles persisted field
Refactor auth helper	deferred	Behavior-preserving tests pass
Email copy	blocked	Product copy approval

The active item permits only API, schema, and focused tests. The agent may note follow-ups, but not implement them.

Harnessed run

Active behavior: persist required-SSO setting.
Changed: schema, settings API, API tests.
Not changed: invite acceptance, UI, email copy, auth helper refactor.
Verification: pnpm test workspace-settings-api.
Next queued item: enforce setting during invite acceptance.

The patch is smaller and reviewable.

Coding-agent example

Behavior-sized coding items:

Bad item	Better item
Add SSO setting	Persist setting through API
Fix invite flow	Block non-SSO acceptance when flag true
Update UI	Toggle persisted setting
Clean auth	Refactor helper without behavior change

Each item should have one evidence condition.

Review artifact

Active work is easiest to control when every candidate change has a state.

Behavior	State	Evidence
SSO setting persists after save	Active	Unit test for settings update
Invite copy mentions SSO	Queued	Product copy approval needed
Trial state shown during onboarding	Queued	Separate billing fixture needed
Dashboard card reflects setup status	Blocked	Depends on checklist model decision
Settings page refactor	Rejected	No user-visible behavior in current task

This table is not project management decoration. It is the harness that prevents an agent from opening five fronts at once. A coding agent can hold many files in context and make fast edits, so it often creates parallel half-fixes. The active-work table tells it which behavior is allowed to consume edits now.

The implementation prompt can enforce the same rule:

Only one behavior may be Active.
If you discover another needed behavior, add it to Queued or Blocked.
Do not edit files for Queued or Blocked behaviors.
Completion requires evidence for the Active behavior only.

This protects review quality. A small patch that proves one behavior is easier to approve than a sweeping patch that might improve the product but cannot be reasoned about. It also protects the agent from its own momentum. When the model sees related code, it wants to clean it up. The harness should treat that impulse as a candidate item, not permission.

In the workspace settings incident, a harnessed agent would finish the SSO persistence bug first. It might leave a note that dashboard copy is stale, but it would not combine that copy change with persistence logic. The team gets a verifiable fix instead of a broad “activation improvement” bundle.

Harnessed version

The harnessed run converts “improve the strategy” or “improve onboarding” into a queue. The first active item becomes: “SSO setting persists after save and reload.” That item owns the patch, tests, and evidence. Everything else is recorded but untouched.

This does not slow the team down. It reduces recovery cost. If the SSO persistence patch is wrong, the rollback is obvious. If it is right, the next queued item starts from a known state. The agent still sees the larger product opportunity, but the harness controls how much unfinished work can exist at once.

Common mistakes

The first mistake is splitting by layer only. “Do database work” is not a behavior unless it has observable evidence.

The second mistake is treating refactors as free. Refactors need their own verification.

The third mistake is letting blocked decisions become guesses.

The fourth mistake is keeping a patch large because “the files are related.” Related is not the same as active.

Practical exercise

Take one feature issue and split it into five behavior-sized items. For each item, write allowed files, excluded files, evidence, and current state.

Then choose one active item and forbid all other changes.

Key takeaways

Coding agents need work-in-progress limits.
One active behavior makes patches reviewable.
Related cleanup should be queued or deferred.
Blocked work should not become guessed work.
Evidence should attach to each behavior.