Specify 40 min

Define the Work Surface

Turn vague coding-agent requests into bounded code changes that can be implemented and reviewed.

Failure pattern

A developer asks the coding agent to “fix onboarding,” and the agent turns a vague product complaint into a broad code change across auth, UI, email, analytics, and tests.

The failure begins before the first file is edited. The agent is not given a precise behavior, affected surface, exclusion list, verification condition, or stop rule. It tries to be helpful by expanding the task into everything that seems related.

Incident: onboarding setup bug

Agent task

A product engineer writes:

New users get stuck during onboarding. Fix the onboarding bug before tomorrow’s demo.

The SaaS app has a team-setup flow: create workspace, invite teammate, connect billing trial, and land in the dashboard.

Available surface

The agent can read and edit:

SurfaceExamples
Frontend routeapp/onboarding/*, setup checklist, dashboard empty state
APIworkspace creation, invite endpoint, trial endpoint
Databaseworkspace, invitation, billing trial tables
Email templatesinvite email and welcome email
Testsunit tests, Playwright onboarding flow, API integration tests
Issue trackerbug report and demo checklist

The bug report says only that users “get stuck.” It does not define which step fails.

Bad run

The agent edits:

- onboarding checklist state
- invitation API response
- welcome email link
- dashboard empty-state copy
- trial-start side effect
- one Playwright test

It then reports:

Fixed onboarding end to end and improved the demo flow.

Review finds the original bug was only this: invited users who accepted from email landed on /dashboard before their workspace membership was hydrated. The agent changed unrelated billing behavior and introduced a trial-start regression.

Why the harness failed

The harness let a symptom become a work surface.

Missing boundaryConsequence
User-visible behaviorAgent did not know which onboarding step was broken
Allowed filesAgent edited API, email, billing, and dashboard together
Excluded workDemo polish became implementation scope
VerificationNo single reproduction was defined before edits
Stop conditionBilling side effects were changed without approval

The result was not simply too much code. It was unreviewable code.

Why it happens

Coding agents follow semantic proximity. If onboarding is broken, dashboard state is related. So are invites, emails, billing trials, analytics, and copy. A human engineer narrows by asking “which behavior fails?” A harness should force that narrowing before implementation starts.

The work surface protects both the agent and the reviewer. It tells the agent what to change and tells the reviewer what should not have changed.

Harness principle

A coding work surface is the bounded code behavior the agent may change in one run.

It defines:

  • Behavior: the exact user or system behavior to fix.
  • Allowed surface: files, modules, commands, and tests likely in scope.
  • Excluded surface: related areas that must not change.
  • Evidence: reproduction, test, or manual scenario proving the fix.
  • Stop rules: conditions requiring human decision.
flowchart LR
  A["Vague issue"] --> B["Reproduction"]
  B --> C["Allowed code surface"]
  C --> D["Excluded surface"]
  D --> E["Verification command"]
  E --> F["Reviewable patch"]
A coding work surface turns a broad bug report into a reviewable change.

Operating practice

Rewrite the task before coding:

FieldHarnessed brief
BehaviorInvited user accepting email should land in onboarding until membership is loaded
ReproductionCreate invite, accept from email link, observe dashboard redirect
Allowed surfaceInvite acceptance route, membership hydration, onboarding redirect test
Excluded surfaceBilling trial start, email template copy, dashboard empty state
VerificationPlaywright invite-acceptance flow plus related unit test
Stop rulesAsk before touching billing, auth schema, or migration files
Completion evidenceFailing reproduction passes and unrelated onboarding tests still pass

Harnessed run

The agent now returns:

Changed:
- invite acceptance route waits for membership hydration
- redirect guard sends incomplete users to /onboarding
- added Playwright coverage for invite acceptance

Not changed:
- billing trial logic
- email template copy
- dashboard empty state

Verification:
- pnpm test onboarding-redirect
- pnpm playwright onboarding-invite.spec.ts

The patch is smaller, easier to review, and directly tied to the bug.

Coding-agent example

For coding agents, output modes matter:

ModeAgent may doAgent may not do
Bug fixPatch one failing behaviorRefactor adjacent systems
InvestigationInspect, reproduce, report causeChange production code
Test additionAdd missing coverageChange implementation unless asked
RefactorPreserve behavior with evidenceAdd new product behavior

The harness should name the mode.

Review artifact

A work-surface brief should be short enough to fit at the top of the task, but precise enough that a reviewer can reject scope drift without debating intent.

FieldExample
User-visible behaviorInvited users land in workspace setup after accepting a valid invite
Entry point/invite/:token acceptance flow
In scopeToken validation, membership creation, redirect target, acceptance test
Out of scopeEmail copy, billing trials, dashboard redesign, onboarding checklist logic
ConstraintsNo production data migration, no auth provider change, no copy rewrite
EvidenceFailing test reproduced, patch applied, acceptance path passes, no regression in expired-token path

The brief also needs a refusal rule. If the agent discovers that the real issue is an upstream auth callback bug, it should stop and report the new work surface instead of silently widening the task. That one rule prevents a large class of agent failures: the agent tries to be helpful, finds a larger problem, and returns a much bigger patch than the team can safely review.

For coding agents, the work surface is not only a prompt document. It is enforced by route-specific tests, allowed commands, branch policy, and review gates. A good brief therefore pairs intent with a narrow verification path:

Required evidence:
- show failing invite acceptance test before patch
- show passing invite acceptance test after patch
- run existing auth redirect regression tests
- list files changed and explain why each file belongs to invite acceptance

This is the difference between asking an agent to “fix onboarding” and giving it a bounded engineering assignment. The model may still reason broadly, but the harness makes completion narrow.

Common mistakes

The first mistake is accepting product nouns as scope. “Onboarding” is not a task. “Invite acceptance redirects too early” is.

The second mistake is letting the agent improve nearby code. Reviewers need a clean reason for every changed file.

The third mistake is defining completion as “tests pass” without naming the failing scenario. Generic green tests may miss the bug.

The fourth mistake is failing to stop at dangerous boundaries. Billing, auth, migrations, and permissions usually deserve explicit approval.

Practical exercise

Take one vague issue from a repo and write a work-surface brief with behavior, reproduction, allowed files, excluded files, verification, and stop rules.

Then ask whether a reviewer could reject an unrelated file change from the brief alone. If not, the surface is still too broad.

Key takeaways

  • Coding agents need bounded work before code changes.
  • A bug report is not automatically a work surface.
  • Exclusions are as important as allowed files.
  • Verification should be defined before implementation.
  • Smaller patches are not just cleaner; they are more auditable.

Further reading / source notes