Prepare the Runway

Failure pattern

The agent starts research before the data and tools are trustworthy, so every later result is ambiguous.

In quant work, this is especially costly. A factor screen can look precise while using stale prices, missing corporate actions, broken benchmark membership, or a backtest engine that silently changed defaults. Once analysis starts on a bad runway, no one knows whether the result is insight or setup noise.

Incident: factor screen before preflight

Agent task

A researcher asks:

Run a semiconductor factor screen for revision momentum and quality. Bring me the top long and short candidates before the morning meeting.

The agent moves immediately into screening.

Available surface

The workflow depends on:

Surface	Required condition
Market data snapshot	Prices and estimates updated after prior close
Corporate actions	Splits, dividends, and restatements applied
Universe membership	Point-in-time semiconductor universe available
Benchmark data	Current sector benchmark and weights loaded
Backtest engine	Baseline strategy test passes
Risk model	Latest factor exposures available

The harness has no mandatory preflight. The agent can call the screen tool directly.

Bad run

The agent returns:

Top long candidates:
- Name A: high revisions, strong quality
- Name B: improving margins

Top short candidates:
- Name C: negative revisions
- Name D: weak quality

Later, the analyst discovers three runway problems:

Estimates data was stale by one trading day.
A split adjustment was missing for one candidate.
The benchmark constituent file failed to load, so relative rankings used a fallback sector list.

The screen is not reliable. The agent did not fail at ranking; it ranked from an unverified environment.

Why the harness failed

The harness let product work start before runway checks.

Missing check	Consequence
Data freshness	Agent used stale estimates
Corporate-action status	Split-adjusted returns were wrong
Universe membership	Ranking used fallback constituents
Baseline backtest	Tool health was not proven
Known broken state	Fallback behavior was invisible

The output looked like research, but it was setup debt.

Why it happens

Agents are task-directed. If asked to run a screen, they run a screen. They may not stop to ask whether the market-data snapshot is current unless the harness makes that a start condition.

Humans often know the daily ritual: check data loads, read pipeline alerts, confirm benchmark files, scan corporate-action warnings, and run a small baseline. A quant agent needs that ritual encoded. Otherwise, stale inputs produce confident artifacts.

Harness principle

Initialization is its own phase.

Before research execution, the harness should prove:

Data is fresh enough.
Universe and benchmark files are available.
Corporate-action adjustments are applied.
Backtest and screen tools pass a baseline run.
Known broken state is recorded.
Degraded modes are explicit.

flowchart LR
  A["Start research run"] --> B["Check data freshness"]
  B --> C["Check universe and benchmark"]
  C --> D["Check corporate actions"]
  D --> E["Run baseline tool test"]
  E --> F{"Runway clear?"}
  F -->|"Yes"| G["Run factor screen"]
  F -->|"No"| H["Stop or use declared degraded mode"]

Runway preparation turns data/tool health into a start condition for quant research.

A runway check is not bureaucracy. It protects the meaning of results.

Operating practice

Use a preflight record:

Check	Pass condition	Result
Prices	Snapshot after prior close	Pass
Estimates	Vendor load timestamp after 06:00	Fail
Corporate actions	No unresolved adjustments for universe	Pass
Universe	Point-in-time semiconductor file loaded	Pass
Benchmark	Sector benchmark weights loaded	Fail
Baseline screen	Known sample returns expected top names	Not run

With this record, the agent should not produce final rankings. It should return:

Runway blocked:
- Estimates snapshot is stale.
- Benchmark weights failed to load.
- Factor screen not executed as reviewable output.
Possible degraded mode:
- Run exploratory absolute ranking only, clearly marked not committee-ready.

That is a successful harness response.

The preflight record should be saved with the run, even when all checks pass. Later, when a researcher questions a surprising signal, the team can see whether the run started from a clean environment. This matters because many quant errors are discovered after the fact. A clean preflight does not prove the thesis, but it narrows the search space when debugging.

For advisory workflows, the preflight can also control output level. If every check passes, the agent may create a review packet. If a non-critical benchmark check fails, it may create an exploratory note. If risk model or point-in-time data fails, it should stop before producing advisory language.

Product-agent example

A quant preflight contract should define hard stops and degraded modes:

Condition	Behavior
Risk model missing	Stop advisory output
Estimates stale	Stop revision analysis
Benchmark missing	Allow absolute screen only, not relative ranking
Corporate actions unresolved	Exclude affected names or stop
Baseline tool test fails	Stop and report runway failure

The contract prevents the agent from treating partial data as complete research.

This is also where the harness can encode desk-specific tolerance. Some teams may allow exploratory screens with stale benchmark weights if the memo is clearly marked exploratory. They should not allow the same output to enter committee review. The runway check should therefore produce both a technical status and an allowed output mode.

Common mistakes

The first mistake is checking only tool availability. A data tool can respond and still return stale data.

The second mistake is allowing silent fallback. Fallback universe or benchmark behavior must be visible.

The third mistake is mixing runway repair with research output. If the agent fixes data loads and produces a thesis in one run, evidence becomes muddy.

The fourth mistake is treating exploratory output as committee-ready. Degraded mode should be labeled.

Practical exercise

Write a preflight checklist for one quant-agent workflow. Include data freshness, methodology version, universe, benchmark, tool baseline, and risk model checks.

Then define what the agent should do when each check fails: stop, degrade, exclude, or escalate.

Key takeaways

Quant research started on an unverified runway is already suspect.
Freshness and methodology checks must precede analysis.
Fallback behavior should never be silent.
Degraded output must be labeled as degraded.
A blocked runway is useful information, not a failed agent.