Prepare the Runway
Separate setup, environment checks, and first verification from coding work.
Failure pattern
The agent starts editing code before proving the repo can install, start, and verify from a clean baseline.
Later, tests fail. The app does not boot. Generated types are stale. Dependencies are missing. Nobody knows whether the failures came from the task or the runway.
Incident: feature work on broken baseline
Agent task
The agent is asked:
Add required-SSO enforcement to workspace invitations.
The feature touches auth, workspace settings, invitation acceptance, API validation, and tests.
Available surface
Before implementation, the repo needs:
| Surface | Required condition |
|---|---|
| Dependencies | pnpm install complete |
| Environment | required env vars present |
| Database | local DB reachable and migrated |
| Generated types | current with schema |
| Baseline tests | current relevant tests pass or known failures recorded |
| Dev server | onboarding/auth route starts locally |
The agent skips baseline checks and starts editing.
Bad run
After changing code, it runs tests. Type generation fails because local schema types were stale before the task. Integration tests fail because the local database was not migrated. The agent then changes generated files, migration setup, and feature code together.
The final patch contains the feature plus runway repair. Review becomes difficult.
Why the harness failed
The harness did not separate initialization from implementation.
| Missing check | Consequence |
|---|---|
| Install | Agent could not trust dependency state |
| Env check | Missing auth secret appeared during tests |
| DB migration baseline | Feature migration mixed with setup repair |
| Type generation baseline | Stale generated files were blamed on feature |
| Known failures | Agent tried to fix unrelated broken tests |
The first reliable signal should happen before code changes.
Why it happens
Coding agents are eager to work on the requested change. They may only discover setup issues when a command fails. By then, the patch has already changed state, and failures become ambiguous.
Human engineers often run a startup ritual automatically. Agents need that ritual encoded in the harness.
Harness principle
Initialization is its own phase.
flowchart LR
A["Start session"] --> B["Install/check deps"]
B --> C["Check env and DB"]
C --> D["Run baseline verification"]
D --> E{"Runway clear?"}
E -->|"Yes"| F["Start coding task"]
E -->|"No"| G["Record or fix runway issue first"] The runway does not need to be perfect. It needs to be known.
Operating practice
Use a preflight record:
| Check | Pass condition | Result |
|---|---|---|
| Dependencies | lockfile install succeeds | Pass |
| Env | required local keys present | Fail: missing AUTH_SECRET |
| DB | local DB migrated | Pass |
| Generated types | no diff after generation | Pass |
| Baseline test | invite acceptance test passes | Not run because env failed |
The harnessed output should be:
Runway blocked.
Missing AUTH_SECRET prevents auth tests from running.
Feature implementation not started.
Next action: configure local env or switch task to runway repair.
This is better than an agent editing code on a broken baseline.
Coding-agent example
Define start rules:
| Condition | Behavior |
|---|---|
| Missing env | Stop before implementation |
| Baseline relevant test fails | Record known failure or make baseline repair task |
| Generated files stale | Regenerate before feature work |
| Dev server fails | Stop or switch to setup task |
| Full CI flaky | Record known flaky test and run focused verification |
The harness should make the start state explicit.
Review artifact
A runway record should be produced before the first implementation edit. It does not need to be ceremonial; it needs to make the starting state inspectable.
| Check | Command or source | Expected | Result |
|---|---|---|---|
| Dependencies installed | pnpm install --frozen-lockfile | No lockfile drift | Pass |
| Type baseline | pnpm typecheck | Existing baseline known | Fails in unrelated generated client |
| Focused tests | pnpm test invite | Current failure reproduced | Invite redirect fails |
| Data state | local seed version | Workspace invite seed present | Pass |
| Feature flags | .env.local | SSO invite enforcement on | Pass |
The important row is the type baseline failure. Without a runway record, the agent might edit code, run typecheck, see the unrelated generated-client failure, and keep changing files in the wrong area. With the record, the run starts from a known broken state. The agent can say: “This failure existed before my patch; it is not evidence against the invite fix.”
Good runway design separates setup from task work:
Runway phase:
1. install and environment check
2. identify existing failures
3. reproduce target failure
4. confirm allowed commands
5. begin implementation only after target failure is visible
This is especially important for coding agents because they are tempted to turn environment friction into code changes. If the database is not seeded, the correct next step is not usually to rewrite the invite route. If the test runner is misconfigured, the correct next step is not to change product logic. The runway gives those problems a place to live before implementation begins.
The harnessed version of the incident would have stopped the agent after the stale seed and broken type baseline were visible. It would have produced a short setup note, asked whether to refresh generated clients, and only then touched invite enforcement.
Harnessed version
The harnessed run begins by proving the environment can support the task. It does not treat setup as background noise. The agent first records the baseline, then reproduces the specific SSO invite failure, then names the known unrelated failures. Only after that does implementation begin.
The resulting review is cleaner. When the patch arrives, the reviewer can see which failures are new, which failures were pre-existing, and which command proves the behavior changed. This prevents a common coding-agent trap: a model changes code to satisfy a broken environment rather than fixing the requested behavior.
Common mistakes
The first mistake is treating preflight as wasted time. It saves review time later.
The second mistake is running only broad tests. A cheap focused baseline is often more useful before implementation.
The third mistake is mixing setup repair with feature work. If setup is broken, make setup the task.
The fourth mistake is ignoring generated files until the end. Generated drift can pollute a feature diff.
Practical exercise
Write a five-step runway check for one repo. Include install, env, data store, generated artifacts, and one focused baseline verification.
Then define what happens when each step fails.
Key takeaways
- Coding work on an unknown baseline produces muddy feedback.
- Preflight checks should happen before implementation.
- Known broken state is better than hidden broken state.
- Setup repair and feature delivery should be separated.
- A blocked runway is a useful task outcome.
Further reading / source notes
- Anthropic, “Effective harnesses for long-running agents” for setup and baseline practices.
- OpenAI, “Harness engineering” for feedback-loop design around agents.