Execute 40 min

Prepare the Runway

Separate setup, data checks, and first verification from quant research work.

Failure pattern

The agent starts research before the data and tools are trustworthy, so every later result is ambiguous.

In quant work, this is especially costly. A factor screen can look precise while using stale prices, missing corporate actions, broken benchmark membership, or a backtest engine that silently changed defaults. Once analysis starts on a bad runway, no one knows whether the result is insight or setup noise.

Incident: factor screen before preflight

Agent task

A researcher asks:

Run a semiconductor factor screen for revision momentum and quality. Bring me the top long and short candidates before the morning meeting.

The agent moves immediately into screening.

Available surface

The workflow depends on:

SurfaceRequired condition
Market data snapshotPrices and estimates updated after prior close
Corporate actionsSplits, dividends, and restatements applied
Universe membershipPoint-in-time semiconductor universe available
Benchmark dataCurrent sector benchmark and weights loaded
Backtest engineBaseline strategy test passes
Risk modelLatest factor exposures available

The harness has no mandatory preflight. The agent can call the screen tool directly.

Bad run

The agent returns:

Top long candidates:
- Name A: high revisions, strong quality
- Name B: improving margins

Top short candidates:
- Name C: negative revisions
- Name D: weak quality

Later, the analyst discovers three runway problems:

  • Estimates data was stale by one trading day.
  • A split adjustment was missing for one candidate.
  • The benchmark constituent file failed to load, so relative rankings used a fallback sector list.

The screen is not reliable. The agent did not fail at ranking; it ranked from an unverified environment.

Why the harness failed

The harness let product work start before runway checks.

Missing checkConsequence
Data freshnessAgent used stale estimates
Corporate-action statusSplit-adjusted returns were wrong
Universe membershipRanking used fallback constituents
Baseline backtestTool health was not proven
Known broken stateFallback behavior was invisible

The output looked like research, but it was setup debt.

Why it happens

Agents are task-directed. If asked to run a screen, they run a screen. They may not stop to ask whether the market-data snapshot is current unless the harness makes that a start condition.

Humans often know the daily ritual: check data loads, read pipeline alerts, confirm benchmark files, scan corporate-action warnings, and run a small baseline. A quant agent needs that ritual encoded. Otherwise, stale inputs produce confident artifacts.

Harness principle

Initialization is its own phase.

Before research execution, the harness should prove:

  • Data is fresh enough.
  • Universe and benchmark files are available.
  • Corporate-action adjustments are applied.
  • Backtest and screen tools pass a baseline run.
  • Known broken state is recorded.
  • Degraded modes are explicit.
flowchart LR
  A["Start research run"] --> B["Check data freshness"]
  B --> C["Check universe and benchmark"]
  C --> D["Check corporate actions"]
  D --> E["Run baseline tool test"]
  E --> F{"Runway clear?"}
  F -->|"Yes"| G["Run factor screen"]
  F -->|"No"| H["Stop or use declared degraded mode"]
Runway preparation turns data/tool health into a start condition for quant research.

A runway check is not bureaucracy. It protects the meaning of results.

Operating practice

Use a preflight record:

CheckPass conditionResult
PricesSnapshot after prior closePass
EstimatesVendor load timestamp after 06:00Fail
Corporate actionsNo unresolved adjustments for universePass
UniversePoint-in-time semiconductor file loadedPass
BenchmarkSector benchmark weights loadedFail
Baseline screenKnown sample returns expected top namesNot run

With this record, the agent should not produce final rankings. It should return:

Runway blocked:
- Estimates snapshot is stale.
- Benchmark weights failed to load.
- Factor screen not executed as reviewable output.
Possible degraded mode:
- Run exploratory absolute ranking only, clearly marked not committee-ready.

That is a successful harness response.

The preflight record should be saved with the run, even when all checks pass. Later, when a researcher questions a surprising signal, the team can see whether the run started from a clean environment. This matters because many quant errors are discovered after the fact. A clean preflight does not prove the thesis, but it narrows the search space when debugging.

For advisory workflows, the preflight can also control output level. If every check passes, the agent may create a review packet. If a non-critical benchmark check fails, it may create an exploratory note. If risk model or point-in-time data fails, it should stop before producing advisory language.

Product-agent example

A quant preflight contract should define hard stops and degraded modes:

ConditionBehavior
Risk model missingStop advisory output
Estimates staleStop revision analysis
Benchmark missingAllow absolute screen only, not relative ranking
Corporate actions unresolvedExclude affected names or stop
Baseline tool test failsStop and report runway failure

The contract prevents the agent from treating partial data as complete research.

This is also where the harness can encode desk-specific tolerance. Some teams may allow exploratory screens with stale benchmark weights if the memo is clearly marked exploratory. They should not allow the same output to enter committee review. The runway check should therefore produce both a technical status and an allowed output mode.

Common mistakes

The first mistake is checking only tool availability. A data tool can respond and still return stale data.

The second mistake is allowing silent fallback. Fallback universe or benchmark behavior must be visible.

The third mistake is mixing runway repair with research output. If the agent fixes data loads and produces a thesis in one run, evidence becomes muddy.

The fourth mistake is treating exploratory output as committee-ready. Degraded mode should be labeled.

Practical exercise

Write a preflight checklist for one quant-agent workflow. Include data freshness, methodology version, universe, benchmark, tool baseline, and risk model checks.

Then define what the agent should do when each check fails: stop, degrade, exclude, or escalate.

Key takeaways

  • Quant research started on an unverified runway is already suspect.
  • Freshness and methodology checks must precede analysis.
  • Fallback behavior should never be silent.
  • Degraded output must be labeled as degraded.
  • A blocked runway is useful information, not a failed agent.

Further reading / source notes