Prepare the Runway

Failure pattern

The agent starts editing code before proving the repo can install, start, and verify from a clean baseline.

Later, tests fail. The app does not boot. Generated types are stale. Dependencies are missing. Nobody knows whether the failures came from the task or the runway.

Incident: feature work on broken baseline

Agent task

The agent is asked:

Add required-SSO enforcement to workspace invitations.

The feature touches auth, workspace settings, invitation acceptance, API validation, and tests.

Available surface

Before implementation, the repo needs:

Surface	Required condition
Dependencies	`pnpm install` complete
Environment	required env vars present
Database	local DB reachable and migrated
Generated types	current with schema
Baseline tests	current relevant tests pass or known failures recorded
Dev server	onboarding/auth route starts locally

The agent skips baseline checks and starts editing.

Bad run

After changing code, it runs tests. Type generation fails because local schema types were stale before the task. Integration tests fail because the local database was not migrated. The agent then changes generated files, migration setup, and feature code together.

The final patch contains the feature plus runway repair. Review becomes difficult.

Why the harness failed

The harness did not separate initialization from implementation.

Missing check	Consequence
Install	Agent could not trust dependency state
Env check	Missing auth secret appeared during tests
DB migration baseline	Feature migration mixed with setup repair
Type generation baseline	Stale generated files were blamed on feature
Known failures	Agent tried to fix unrelated broken tests

The first reliable signal should happen before code changes.

Why it happens

Coding agents are eager to work on the requested change. They may only discover setup issues when a command fails. By then, the patch has already changed state, and failures become ambiguous.

Human engineers often run a startup ritual automatically. Agents need that ritual encoded in the harness.

Harness principle

Initialization is its own phase.

flowchart LR
  A["Start session"] --> B["Install/check deps"]
  B --> C["Check env and DB"]
  C --> D["Run baseline verification"]
  D --> E{"Runway clear?"}
  E -->|"Yes"| F["Start coding task"]
  E -->|"No"| G["Record or fix runway issue first"]

Runway preparation makes later failures meaningful.

The runway does not need to be perfect. It needs to be known.

Operating practice

Use a preflight record:

Check	Pass condition	Result
Dependencies	lockfile install succeeds	Pass
Env	required local keys present	Fail: missing `AUTH_SECRET`
DB	local DB migrated	Pass
Generated types	no diff after generation	Pass
Baseline test	invite acceptance test passes	Not run because env failed

The harnessed output should be:

Runway blocked.
Missing AUTH_SECRET prevents auth tests from running.
Feature implementation not started.
Next action: configure local env or switch task to runway repair.

This is better than an agent editing code on a broken baseline.

Coding-agent example

Define start rules:

Condition	Behavior
Missing env	Stop before implementation
Baseline relevant test fails	Record known failure or make baseline repair task
Generated files stale	Regenerate before feature work
Dev server fails	Stop or switch to setup task
Full CI flaky	Record known flaky test and run focused verification

The harness should make the start state explicit.

Review artifact

A runway record should be produced before the first implementation edit. It does not need to be ceremonial; it needs to make the starting state inspectable.

Check	Command or source	Expected	Result
Dependencies installed	`pnpm install --frozen-lockfile`	No lockfile drift	Pass
Type baseline	`pnpm typecheck`	Existing baseline known	Fails in unrelated generated client
Focused tests	`pnpm test invite`	Current failure reproduced	Invite redirect fails
Data state	local seed version	Workspace invite seed present	Pass
Feature flags	`.env.local`	SSO invite enforcement on	Pass

The important row is the type baseline failure. Without a runway record, the agent might edit code, run typecheck, see the unrelated generated-client failure, and keep changing files in the wrong area. With the record, the run starts from a known broken state. The agent can say: “This failure existed before my patch; it is not evidence against the invite fix.”

Good runway design separates setup from task work:

Runway phase:
1. install and environment check
2. identify existing failures
3. reproduce target failure
4. confirm allowed commands
5. begin implementation only after target failure is visible

This is especially important for coding agents because they are tempted to turn environment friction into code changes. If the database is not seeded, the correct next step is not usually to rewrite the invite route. If the test runner is misconfigured, the correct next step is not to change product logic. The runway gives those problems a place to live before implementation begins.

The harnessed version of the incident would have stopped the agent after the stale seed and broken type baseline were visible. It would have produced a short setup note, asked whether to refresh generated clients, and only then touched invite enforcement.

Harnessed version

The harnessed run begins by proving the environment can support the task. It does not treat setup as background noise. The agent first records the baseline, then reproduces the specific SSO invite failure, then names the known unrelated failures. Only after that does implementation begin.

The resulting review is cleaner. When the patch arrives, the reviewer can see which failures are new, which failures were pre-existing, and which command proves the behavior changed. This prevents a common coding-agent trap: a model changes code to satisfy a broken environment rather than fixing the requested behavior.

Common mistakes

The first mistake is treating preflight as wasted time. It saves review time later.

The second mistake is running only broad tests. A cheap focused baseline is often more useful before implementation.

The third mistake is mixing setup repair with feature work. If setup is broken, make setup the task.

The fourth mistake is ignoring generated files until the end. Generated drift can pollute a feature diff.

Practical exercise

Write a five-step runway check for one repo. Include install, env, data store, generated artifacts, and one focused baseline verification.

Then define what happens when each step fails.

Key takeaways

Coding work on an unknown baseline produces muddy feedback.
Preflight checks should happen before implementation.
Known broken state is better than hidden broken state.
Setup repair and feature delivery should be separated.
A blocked runway is a useful task outcome.