Execute 39 min

Persist Progress

Make long-running quant research restartable without relying on chat history.

Failure pattern

A quant investigation spans multiple sessions, but progress lives only in the conversation. The next run repeats work, trusts stale memory, or contradicts earlier findings.

Drawdown analysis, data-quality investigations, and strategy reviews rarely finish cleanly in one session. If the harness does not persist state, every continuation starts with uncertainty.

Incident: momentum strategy drawdown

Agent task

A researcher asks:

Investigate why the semiconductor revision-momentum strategy underperformed over the last six weeks. Identify whether the issue is signal decay, risk exposure, data quality, or market regime.

This is a multi-step investigation. It cannot be solved by one chart.

Available surface

The agent can inspect:

SurfaceContents
Strategy run historyDaily portfolio, weights, returns, turnover
Factor attributionRevision momentum, quality, beta, size, value
Risk modelSector, style, country, and single-name exposure
Data-quality logsEstimate revisions, vendor corrections, missing values
Market regime dashboardRates, volatility, semiconductor index, crowding proxy
Research notesPrior hypotheses and committee questions

The first run ends after 45 minutes because the analyst has to leave for a meeting.

Bad run

The first agent writes only a chat summary:

Underperformance may be related to beta exposure and possible estimate-data noise. Need to check whether the drawdown was concentrated in high-beta shorts.

The next agent starts later and reads only the strategy returns. It concludes:

The drawdown is mostly signal decay. Revision momentum stopped working after the latest earnings cycle.

That contradicts the first run, which had already found that two high-beta shorts drove most losses and that estimate-data noise was still unverified.

Why the harness failed

Progress was not durable.

Missing recordConsequence
Verified findingsConcentration in two shorts was rediscovered late
Rejected hypothesesBroad signal-decay claim was reopened too soon
Open questionsEstimate-data noise remained vague
Evidence IDsNo links to attribution run or risk snapshot
Next actionSecond run chose a different investigation path

The chat contained useful hints. It did not contain restartable state.

Why it happens

Long-running analysis creates partial truth. Some hypotheses are verified, some rejected, some blocked, and some still open. A prose summary often blends these states.

For quant work, that blending is dangerous. “Beta exposure may be involved” is not enough. Did the agent measure beta? Which date range? Which risk model? Did it explain 20 percent or 80 percent of drawdown? A continuation run needs those details.

Harness principle

Persist progress as a research artifact.

The record should distinguish:

  • Verified findings.
  • Rejected hypotheses.
  • Open questions.
  • Blockers.
  • Evidence IDs.
  • Next action.
flowchart LR
  A["Session starts"] --> B["Read research state"]
  B --> C["Continue active hypothesis"]
  C --> D["Verify or reject"]
  D --> E["Update research state"]
  E --> F["Next session resumes"]
Persistent research state lets the next run continue the investigation instead of restarting it.

The record is not a diary. It is a restart instruction.

Operating practice

Write a research continuation note:

FieldExample
ObjectiveExplain six-week drawdown in semiconductor revision-momentum strategy
Verified62% of loss came from two high-beta short positions
RejectedBroad long-book weakness does not explain drawdown
OpenEstimate-data revision lag for three names not yet checked
BlockedNeed vendor correction log for May 13 load
EvidenceAttribution run ATTR-2026-05-17-A, risk snapshot RISK-9912
Next actionCheck vendor correction log, then rerun attribution excluding affected names

Harnessed run

The next agent reads the note and returns:

Continuation from ATTR-2026-05-17-A.
Confirmed prior finding: two high-beta shorts explain majority of drawdown.
New work:
- Checked vendor correction log for May 13.
- Found delayed estimate revision for one short name.
- Reran attribution excluding delayed record.
Status:
- Data issue explains 11% of drawdown, not primary driver.
Next:
- Test beta-neutral constraint breach during rebalance.

The investigation advances instead of restarting.

The continuation note should be written for a skeptical analyst, not for the model that wrote it. That means every important claim needs enough detail to be checked. “Two shorts explain most losses” is weaker than “two shorts explain 62% of six-week drawdown in attribution run ATTR-2026-05-17-A.” The second statement can be inspected, rerun, or challenged.

The note should also say what not to do. If a hypothesis has been rejected, the next agent should not spend the first hour retesting it unless new data invalidates the rejection. This is how persistent progress reduces both time cost and contradiction risk.

Product-agent example

Use clear state labels:

StateMeaning
verifiedMeasured and safe to rely on
rejectedTested and not supported
openKnown question, not resolved
blockedWaiting on data, approval, or owner
staleNeeds rerun due to new data or methodology

Avoid vague words like “maybe checked” or “seems related.”

For quant teams, stale is especially important. A finding based on yesterday’s estimates may become stale after a vendor correction. A result based on quality_v2 may become stale after the factor registry moves to quality_v3. Persisted progress should include the version boundary that makes the finding valid.

The progress record should also separate research status from memo status. A drawdown explanation can be partially verified while the memo remains unready. If those states collapse into “almost done,” the next session may present work that still needs investigation.

Common mistakes

The first mistake is saving only final conclusions. Investigations need rejected hypotheses too.

The second mistake is omitting evidence IDs. Without run IDs, the next agent cannot inspect the basis.

The third mistake is leaving next action implicit. A continuation should know what to do first.

The fourth mistake is failing to mark stale findings. A new data snapshot can invalidate prior analysis.

The fifth mistake is storing progress where the next run will not look. A note in chat, a local scratchpad, and a hidden document are weaker than a predictable research-state location.

Practical exercise

Take one unfinished research investigation and write a continuation note. Include objective, verified findings, rejected hypotheses, open questions, blockers, evidence IDs, and next action.

Then hand it to a fresh session. If the fresh session repeats work, the note is not operational enough.

Key takeaways

  • Chat memory is not durable research state.
  • Continuation notes should separate verified, rejected, open, blocked, and stale findings.
  • Evidence IDs matter in quant work.
  • Rejected hypotheses prevent wasted reruns.
  • The next action should be concrete.

Further reading / source notes