Execute 40 min

Limit Active Work

Prevent scope spread by forcing the coding agent to finish one behavior before starting the next.

Failure pattern

A single issue causes the agent to open UI, API, database, test, and refactor work at the same time.

The patch looks active. Many files change. But no behavior is clearly complete, and every review comment exposes another half-finished thread.

Incident: workspace settings sprawl

Agent task

The issue says:

Add required-SSO setting to workspace invitations.

This is a product behavior, but it can touch many layers.

Available surface

The agent can change:

SurfacePossible work
Settings UICheckbox and copy
APIValidation and persistence
DatabaseNew workspace setting
Invite flowEnforce SSO requirement
TestsUnit, integration, e2e
RefactorShared auth helper cleanup

No work-in-progress limit exists.

Bad run

The agent changes all layers plus a helper refactor:

- adds database column
- changes settings UI
- rewrites invite acceptance guard
- refactors auth helpers
- updates email copy
- adds one integration test

The e2e flow fails because the UI writes the setting but the invite guard reads a different field. The refactor also breaks password-login tests. Review cannot isolate the intended behavior.

Why the harness failed

The harness allowed too many active threads.

Missing controlConsequence
One active behaviorUI, API, DB, auth, and copy changed together
Verification per behaviorNo layer was proven independently
Deferred listRefactor was folded into feature work
Blocked stateMigration decision was guessed
Patch boundaryEmail copy changed without need

The agent did not protect itself from scope spread.

Why it happens

Coding agents follow dependency chains. A feature needs a field, then an API, then UI, then tests, then helpers. The chain is real, but that does not mean every possible improvement belongs in one active run.

Humans use small pull requests, feature flags, and review boundaries. A harness should create similar discipline.

Harness principle

One active behavior at a time.

stateDiagram-v2
  [*] --> Queued
  Queued --> Active: selected behavior
  Active --> Verified: evidence passes
  Active --> Blocked: decision needed
  Blocked --> Active: unblocked
  Verified --> [*]
A work-in-progress limit keeps code changes reviewable.

Large features can still ship. They move through a queue.

Operating practice

Split the issue:

ItemStateEvidence
Persist required-SSO settingactiveAPI test stores and reads setting
Enforce SSO during invite acceptancequeuedInvite flow e2e blocks password signup
Add settings UIqueuedUI toggles persisted field
Refactor auth helperdeferredBehavior-preserving tests pass
Email copyblockedProduct copy approval

The active item permits only API, schema, and focused tests. The agent may note follow-ups, but not implement them.

Harnessed run

Active behavior: persist required-SSO setting.
Changed: schema, settings API, API tests.
Not changed: invite acceptance, UI, email copy, auth helper refactor.
Verification: pnpm test workspace-settings-api.
Next queued item: enforce setting during invite acceptance.

The patch is smaller and reviewable.

Coding-agent example

Behavior-sized coding items:

Bad itemBetter item
Add SSO settingPersist setting through API
Fix invite flowBlock non-SSO acceptance when flag true
Update UIToggle persisted setting
Clean authRefactor helper without behavior change

Each item should have one evidence condition.

Review artifact

Active work is easiest to control when every candidate change has a state.

BehaviorStateEvidence
SSO setting persists after saveActiveUnit test for settings update
Invite copy mentions SSOQueuedProduct copy approval needed
Trial state shown during onboardingQueuedSeparate billing fixture needed
Dashboard card reflects setup statusBlockedDepends on checklist model decision
Settings page refactorRejectedNo user-visible behavior in current task

This table is not project management decoration. It is the harness that prevents an agent from opening five fronts at once. A coding agent can hold many files in context and make fast edits, so it often creates parallel half-fixes. The active-work table tells it which behavior is allowed to consume edits now.

The implementation prompt can enforce the same rule:

Only one behavior may be Active.
If you discover another needed behavior, add it to Queued or Blocked.
Do not edit files for Queued or Blocked behaviors.
Completion requires evidence for the Active behavior only.

This protects review quality. A small patch that proves one behavior is easier to approve than a sweeping patch that might improve the product but cannot be reasoned about. It also protects the agent from its own momentum. When the model sees related code, it wants to clean it up. The harness should treat that impulse as a candidate item, not permission.

In the workspace settings incident, a harnessed agent would finish the SSO persistence bug first. It might leave a note that dashboard copy is stale, but it would not combine that copy change with persistence logic. The team gets a verifiable fix instead of a broad “activation improvement” bundle.

Harnessed version

The harnessed run converts “improve the strategy” or “improve onboarding” into a queue. The first active item becomes: “SSO setting persists after save and reload.” That item owns the patch, tests, and evidence. Everything else is recorded but untouched.

This does not slow the team down. It reduces recovery cost. If the SSO persistence patch is wrong, the rollback is obvious. If it is right, the next queued item starts from a known state. The agent still sees the larger product opportunity, but the harness controls how much unfinished work can exist at once.

Common mistakes

The first mistake is splitting by layer only. “Do database work” is not a behavior unless it has observable evidence.

The second mistake is treating refactors as free. Refactors need their own verification.

The third mistake is letting blocked decisions become guesses.

The fourth mistake is keeping a patch large because “the files are related.” Related is not the same as active.

Practical exercise

Take one feature issue and split it into five behavior-sized items. For each item, write allowed files, excluded files, evidence, and current state.

Then choose one active item and forbid all other changes.

Key takeaways

  • Coding agents need work-in-progress limits.
  • One active behavior makes patches reviewable.
  • Related cleanup should be queued or deferred.
  • Blocked work should not become guessed work.
  • Evidence should attach to each behavior.

Further reading / source notes