Define the Work Surface

Failure pattern

A developer asks the coding agent to “fix onboarding,” and the agent turns a vague product complaint into a broad code change across auth, UI, email, analytics, and tests.

The failure begins before the first file is edited. The agent is not given a precise behavior, affected surface, exclusion list, verification condition, or stop rule. It tries to be helpful by expanding the task into everything that seems related.

Incident: onboarding setup bug

Agent task

A product engineer writes:

New users get stuck during onboarding. Fix the onboarding bug before tomorrow’s demo.

The SaaS app has a team-setup flow: create workspace, invite teammate, connect billing trial, and land in the dashboard.

Available surface

The agent can read and edit:

Surface	Examples
Frontend route	`app/onboarding/*`, setup checklist, dashboard empty state
API	workspace creation, invite endpoint, trial endpoint
Database	workspace, invitation, billing trial tables
Email templates	invite email and welcome email
Tests	unit tests, Playwright onboarding flow, API integration tests
Issue tracker	bug report and demo checklist

The bug report says only that users “get stuck.” It does not define which step fails.

Bad run

The agent edits:

- onboarding checklist state
- invitation API response
- welcome email link
- dashboard empty-state copy
- trial-start side effect
- one Playwright test

It then reports:

Fixed onboarding end to end and improved the demo flow.

Review finds the original bug was only this: invited users who accepted from email landed on /dashboard before their workspace membership was hydrated. The agent changed unrelated billing behavior and introduced a trial-start regression.

Why the harness failed

The harness let a symptom become a work surface.

Missing boundary	Consequence
User-visible behavior	Agent did not know which onboarding step was broken
Allowed files	Agent edited API, email, billing, and dashboard together
Excluded work	Demo polish became implementation scope
Verification	No single reproduction was defined before edits
Stop condition	Billing side effects were changed without approval

The result was not simply too much code. It was unreviewable code.

Why it happens

Coding agents follow semantic proximity. If onboarding is broken, dashboard state is related. So are invites, emails, billing trials, analytics, and copy. A human engineer narrows by asking “which behavior fails?” A harness should force that narrowing before implementation starts.

The work surface protects both the agent and the reviewer. It tells the agent what to change and tells the reviewer what should not have changed.

Harness principle

A coding work surface is the bounded code behavior the agent may change in one run.

It defines:

Behavior: the exact user or system behavior to fix.
Allowed surface: files, modules, commands, and tests likely in scope.
Excluded surface: related areas that must not change.
Evidence: reproduction, test, or manual scenario proving the fix.
Stop rules: conditions requiring human decision.

flowchart LR
  A["Vague issue"] --> B["Reproduction"]
  B --> C["Allowed code surface"]
  C --> D["Excluded surface"]
  D --> E["Verification command"]
  E --> F["Reviewable patch"]

A coding work surface turns a broad bug report into a reviewable change.

Operating practice

Rewrite the task before coding:

Field	Harnessed brief
Behavior	Invited user accepting email should land in onboarding until membership is loaded
Reproduction	Create invite, accept from email link, observe dashboard redirect
Allowed surface	Invite acceptance route, membership hydration, onboarding redirect test
Excluded surface	Billing trial start, email template copy, dashboard empty state
Verification	Playwright invite-acceptance flow plus related unit test
Stop rules	Ask before touching billing, auth schema, or migration files
Completion evidence	Failing reproduction passes and unrelated onboarding tests still pass

Harnessed run

The agent now returns:

Changed:
- invite acceptance route waits for membership hydration
- redirect guard sends incomplete users to /onboarding
- added Playwright coverage for invite acceptance

Not changed:
- billing trial logic
- email template copy
- dashboard empty state

Verification:
- pnpm test onboarding-redirect
- pnpm playwright onboarding-invite.spec.ts

The patch is smaller, easier to review, and directly tied to the bug.

Coding-agent example

For coding agents, output modes matter:

Mode	Agent may do	Agent may not do
Bug fix	Patch one failing behavior	Refactor adjacent systems
Investigation	Inspect, reproduce, report cause	Change production code
Test addition	Add missing coverage	Change implementation unless asked
Refactor	Preserve behavior with evidence	Add new product behavior

The harness should name the mode.

Review artifact

A work-surface brief should be short enough to fit at the top of the task, but precise enough that a reviewer can reject scope drift without debating intent.

Field	Example
User-visible behavior	Invited users land in workspace setup after accepting a valid invite
Entry point	`/invite/:token` acceptance flow
In scope	Token validation, membership creation, redirect target, acceptance test
Out of scope	Email copy, billing trials, dashboard redesign, onboarding checklist logic
Constraints	No production data migration, no auth provider change, no copy rewrite
Evidence	Failing test reproduced, patch applied, acceptance path passes, no regression in expired-token path

The brief also needs a refusal rule. If the agent discovers that the real issue is an upstream auth callback bug, it should stop and report the new work surface instead of silently widening the task. That one rule prevents a large class of agent failures: the agent tries to be helpful, finds a larger problem, and returns a much bigger patch than the team can safely review.

For coding agents, the work surface is not only a prompt document. It is enforced by route-specific tests, allowed commands, branch policy, and review gates. A good brief therefore pairs intent with a narrow verification path:

Required evidence:
- show failing invite acceptance test before patch
- show passing invite acceptance test after patch
- run existing auth redirect regression tests
- list files changed and explain why each file belongs to invite acceptance

This is the difference between asking an agent to “fix onboarding” and giving it a bounded engineering assignment. The model may still reason broadly, but the harness makes completion narrow.

Common mistakes

The first mistake is accepting product nouns as scope. “Onboarding” is not a task. “Invite acceptance redirects too early” is.

The second mistake is letting the agent improve nearby code. Reviewers need a clean reason for every changed file.

The third mistake is defining completion as “tests pass” without naming the failing scenario. Generic green tests may miss the bug.

The fourth mistake is failing to stop at dangerous boundaries. Billing, auth, migrations, and permissions usually deserve explicit approval.

Practical exercise

Take one vague issue from a repo and write a work-surface brief with behavior, reproduction, allowed files, excluded files, verification, and stop rules.

Then ask whether a reviewer could reject an unrelated file change from the brief alone. If not, the surface is still too broad.

Key takeaways

Coding agents need bounded work before code changes.
A bug report is not automatically a work surface.
Exclusions are as important as allowed files.
Verification should be defined before implementation.
Smaller patches are not just cleaner; they are more auditable.