Equip 41 min

Design the Agent Interface

Treat shell commands, repo files, tools, permissions, and feedback as the coding agent's UI.

Failure pattern

The agent has powerful repo access, but the interface hides which commands are safe, which actions mutate state, which checks matter, and what failures mean.

The chat box is not the real interface. The real interface is the repo, scripts, package commands, test output, database tools, CI status, branch state, and permission boundaries. If those are unclear, the agent operates through an unlabeled control panel.

Incident: unsafe migration command

Agent task

The agent is asked:

Add a workspace setting for requiring SSO on invited users.

The change needs UI, API validation, and a database column.

Available surface

The repo exposes:

Interface itemWhat the agent sees
pnpm testRuns many tests, slow, flaky in watch mode
pnpm db:migrateApplies migrations to configured database
pnpm db:resetResets local database
scripts/gen-typesGenerates API and DB types
.env.localContains local database URL
CI pageFull typecheck, unit, integration, e2e

The commands exist, but their safety and purpose are unclear.

Bad run

The agent adds a migration and runs:

pnpm db:migrate
pnpm test

The migration applies to a shared staging database because the environment variable points there. The test run fails halfway due to unrelated flaky e2e tests. The agent then modifies test setup and migration code in the same patch.

Now reviewers cannot tell whether the patch is about SSO settings, database recovery, or test harness repair.

Why the harness failed

The coding interface exposed commands without guardrails.

Interface gapConsequence
Read surfaceAgent did not know current database target
Action surfaceMigration command had unsafe default
Feedback surfaceTest failure did not distinguish flaky baseline from patch failure
Policy surfaceNo rule prevented staging mutation
Tool descriptionsCommands did not state safe usage

The agent was not reckless. The interface invited the wrong action.

Why it happens

Developer tools are often optimized for humans with local knowledge. A human knows which database URL is safe, which script is dangerous, which tests are flaky, and which CI check is authoritative. The agent needs that knowledge in the interface itself.

Harness design turns implicit developer folklore into explicit surfaces.

Harness principle

The coding agent interface has four surfaces:

SurfaceCoding version
Readrepo files, docs, schemas, logs, CI status, branch state
Actionedit files, run tests, generate types, create migrations
Feedbackcommand output, test failures, type errors, CI reports
Policyforbidden commands, approval gates, branch rules, data safety
flowchart LR
  A["Read surface"] --> E["Agent decision"]
  B["Policy surface"] --> E
  E --> C["Action surface"]
  C --> D["Feedback surface"]
  D --> E
  B --> C
A coding agent interface is files and commands plus feedback and policy.

Operating practice

Replace raw commands with safe agent-facing commands:

Raw commandAgent-facing command
pnpm db:migratepnpm db:migrate:local with local target check
pnpm db:resetApproval-only command
pnpm testpnpm test:changed and documented full checks
scripts/gen-typespnpm generate:types with expected output note
CI pagepnpm verify matching required pre-PR checks

The safe migration command should refuse dangerous targets:

Refused:
- DATABASE_URL points to staging.
- Use local database or request approval.
- Migration not applied.

This feedback helps the agent choose the safe path.

Coding-agent example

Every dangerous repo command should say:

FieldExample
PurposeApply local migrations for development
MutatesLocal database only
RefusesStaging, production, unknown database host
OutputApplied migrations or refusal reason
VerificationRun type generation and migration test

The agent should not infer these properties from script names.

Review artifact

The best interface design artifact is a tool contract table. It tells the agent which commands are safe to run, what they prove, and when they are forbidden.

ToolModeAllowed useRequired output
pnpm test authread/verifyRun focused auth testsPass/fail plus failing test names
pnpm db:migration:plandry runInspect migration impactSQL plan and affected tables
pnpm db:migration:apply --env localactionLocal only after plan reviewMigration id and rollback notes
pnpm lintverifyCheck changed filesError list or clean result
pnpm seed resetdestructive local actionOnly in disposable local databaseConfirmation of database target

This contract turns the repository into a safer UI. The agent no longer has to infer whether db:reset is acceptable from a package script name. The harness says what the command is for and what evidence it must return.

The same idea applies to code-level helpers. Instead of exposing a generic runSql(query) tool, expose a bounded operation such as planMembershipBackfill({ workspaceId, dryRun: true }). Instead of a free-form “make a PR” tool, require a structured PR summary with changed behavior, evidence, risk, and rollback notes.

{
  "behavior": "Invite acceptance creates membership event",
  "changed_files": ["src/routes/invite/[token].ts", "tests/e2e/invite-acceptance.spec.ts"],
  "evidence": ["invite acceptance e2e passed", "expired token regression passed"],
  "risk": "Redirect behavior touches first-login path",
  "rollback": "Revert route patch and test addition"
}

That schema is part of the agent interface. It makes the agent speak in units the team can review.

The key design move is to make the safest path the easiest path. If the repo exposes only raw scripts, the agent must invent safety from memory. If the repo exposes planned commands, typed outputs, and explicit modes, the agent can move quickly without guessing where the danger begins.

Common mistakes

The first mistake is assuming shell access is enough. Shell access without safe commands creates ambiguity.

The second mistake is documenting policy far from the command. Put safety into the command or wrapper when possible.

The third mistake is returning noisy failures. Test output should identify likely patch failures versus known baseline failures.

The fourth mistake is exposing destructive commands as default options.

Practical exercise

List the commands a coding agent can run in your repo. Mark each as read-only, safe mutation, dangerous mutation, or approval-only. Then wrap one dangerous command with a refusal check and structured error.

Key takeaways

  • The repo and shell are the coding agent’s UI.
  • Safe defaults matter more than long warnings.
  • Feedback should tell the agent what happened and what to do next.
  • Destructive actions need approval or refusal gates.
  • Good interfaces make correct action obvious.

Further reading / source notes