Execute 39 min

Prepare the Runway

Separate setup, environment checks, and first verification from coding work.

Failure pattern

The agent starts editing code before proving the repo can install, start, and verify from a clean baseline.

Later, tests fail. The app does not boot. Generated types are stale. Dependencies are missing. Nobody knows whether the failures came from the task or the runway.

Incident: feature work on broken baseline

Agent task

The agent is asked:

Add required-SSO enforcement to workspace invitations.

The feature touches auth, workspace settings, invitation acceptance, API validation, and tests.

Available surface

Before implementation, the repo needs:

SurfaceRequired condition
Dependenciespnpm install complete
Environmentrequired env vars present
Databaselocal DB reachable and migrated
Generated typescurrent with schema
Baseline testscurrent relevant tests pass or known failures recorded
Dev serveronboarding/auth route starts locally

The agent skips baseline checks and starts editing.

Bad run

After changing code, it runs tests. Type generation fails because local schema types were stale before the task. Integration tests fail because the local database was not migrated. The agent then changes generated files, migration setup, and feature code together.

The final patch contains the feature plus runway repair. Review becomes difficult.

Why the harness failed

The harness did not separate initialization from implementation.

Missing checkConsequence
InstallAgent could not trust dependency state
Env checkMissing auth secret appeared during tests
DB migration baselineFeature migration mixed with setup repair
Type generation baselineStale generated files were blamed on feature
Known failuresAgent tried to fix unrelated broken tests

The first reliable signal should happen before code changes.

Why it happens

Coding agents are eager to work on the requested change. They may only discover setup issues when a command fails. By then, the patch has already changed state, and failures become ambiguous.

Human engineers often run a startup ritual automatically. Agents need that ritual encoded in the harness.

Harness principle

Initialization is its own phase.

flowchart LR
  A["Start session"] --> B["Install/check deps"]
  B --> C["Check env and DB"]
  C --> D["Run baseline verification"]
  D --> E{"Runway clear?"}
  E -->|"Yes"| F["Start coding task"]
  E -->|"No"| G["Record or fix runway issue first"]
Runway preparation makes later failures meaningful.

The runway does not need to be perfect. It needs to be known.

Operating practice

Use a preflight record:

CheckPass conditionResult
Dependencieslockfile install succeedsPass
Envrequired local keys presentFail: missing AUTH_SECRET
DBlocal DB migratedPass
Generated typesno diff after generationPass
Baseline testinvite acceptance test passesNot run because env failed

The harnessed output should be:

Runway blocked.
Missing AUTH_SECRET prevents auth tests from running.
Feature implementation not started.
Next action: configure local env or switch task to runway repair.

This is better than an agent editing code on a broken baseline.

Coding-agent example

Define start rules:

ConditionBehavior
Missing envStop before implementation
Baseline relevant test failsRecord known failure or make baseline repair task
Generated files staleRegenerate before feature work
Dev server failsStop or switch to setup task
Full CI flakyRecord known flaky test and run focused verification

The harness should make the start state explicit.

Review artifact

A runway record should be produced before the first implementation edit. It does not need to be ceremonial; it needs to make the starting state inspectable.

CheckCommand or sourceExpectedResult
Dependencies installedpnpm install --frozen-lockfileNo lockfile driftPass
Type baselinepnpm typecheckExisting baseline knownFails in unrelated generated client
Focused testspnpm test inviteCurrent failure reproducedInvite redirect fails
Data statelocal seed versionWorkspace invite seed presentPass
Feature flags.env.localSSO invite enforcement onPass

The important row is the type baseline failure. Without a runway record, the agent might edit code, run typecheck, see the unrelated generated-client failure, and keep changing files in the wrong area. With the record, the run starts from a known broken state. The agent can say: “This failure existed before my patch; it is not evidence against the invite fix.”

Good runway design separates setup from task work:

Runway phase:
1. install and environment check
2. identify existing failures
3. reproduce target failure
4. confirm allowed commands
5. begin implementation only after target failure is visible

This is especially important for coding agents because they are tempted to turn environment friction into code changes. If the database is not seeded, the correct next step is not usually to rewrite the invite route. If the test runner is misconfigured, the correct next step is not to change product logic. The runway gives those problems a place to live before implementation begins.

The harnessed version of the incident would have stopped the agent after the stale seed and broken type baseline were visible. It would have produced a short setup note, asked whether to refresh generated clients, and only then touched invite enforcement.

Harnessed version

The harnessed run begins by proving the environment can support the task. It does not treat setup as background noise. The agent first records the baseline, then reproduces the specific SSO invite failure, then names the known unrelated failures. Only after that does implementation begin.

The resulting review is cleaner. When the patch arrives, the reviewer can see which failures are new, which failures were pre-existing, and which command proves the behavior changed. This prevents a common coding-agent trap: a model changes code to satisfy a broken environment rather than fixing the requested behavior.

Common mistakes

The first mistake is treating preflight as wasted time. It saves review time later.

The second mistake is running only broad tests. A cheap focused baseline is often more useful before implementation.

The third mistake is mixing setup repair with feature work. If setup is broken, make setup the task.

The fourth mistake is ignoring generated files until the end. Generated drift can pollute a feature diff.

Practical exercise

Write a five-step runway check for one repo. Include install, env, data store, generated artifacts, and one focused baseline verification.

Then define what happens when each step fails.

Key takeaways

  • Coding work on an unknown baseline produces muddy feedback.
  • Preflight checks should happen before implementation.
  • Known broken state is better than hidden broken state.
  • Setup repair and feature delivery should be separated.
  • A blocked runway is a useful task outcome.

Further reading / source notes