Persist Progress
Make long-running quant research restartable without relying on chat history.
Failure pattern
A quant investigation spans multiple sessions, but progress lives only in the conversation. The next run repeats work, trusts stale memory, or contradicts earlier findings.
Drawdown analysis, data-quality investigations, and strategy reviews rarely finish cleanly in one session. If the harness does not persist state, every continuation starts with uncertainty.
Incident: momentum strategy drawdown
Agent task
A researcher asks:
Investigate why the semiconductor revision-momentum strategy underperformed over the last six weeks. Identify whether the issue is signal decay, risk exposure, data quality, or market regime.
This is a multi-step investigation. It cannot be solved by one chart.
Available surface
The agent can inspect:
| Surface | Contents |
|---|---|
| Strategy run history | Daily portfolio, weights, returns, turnover |
| Factor attribution | Revision momentum, quality, beta, size, value |
| Risk model | Sector, style, country, and single-name exposure |
| Data-quality logs | Estimate revisions, vendor corrections, missing values |
| Market regime dashboard | Rates, volatility, semiconductor index, crowding proxy |
| Research notes | Prior hypotheses and committee questions |
The first run ends after 45 minutes because the analyst has to leave for a meeting.
Bad run
The first agent writes only a chat summary:
Underperformance may be related to beta exposure and possible estimate-data noise. Need to check whether the drawdown was concentrated in high-beta shorts.
The next agent starts later and reads only the strategy returns. It concludes:
The drawdown is mostly signal decay. Revision momentum stopped working after the latest earnings cycle.
That contradicts the first run, which had already found that two high-beta shorts drove most losses and that estimate-data noise was still unverified.
Why the harness failed
Progress was not durable.
| Missing record | Consequence |
|---|---|
| Verified findings | Concentration in two shorts was rediscovered late |
| Rejected hypotheses | Broad signal-decay claim was reopened too soon |
| Open questions | Estimate-data noise remained vague |
| Evidence IDs | No links to attribution run or risk snapshot |
| Next action | Second run chose a different investigation path |
The chat contained useful hints. It did not contain restartable state.
Why it happens
Long-running analysis creates partial truth. Some hypotheses are verified, some rejected, some blocked, and some still open. A prose summary often blends these states.
For quant work, that blending is dangerous. “Beta exposure may be involved” is not enough. Did the agent measure beta? Which date range? Which risk model? Did it explain 20 percent or 80 percent of drawdown? A continuation run needs those details.
Harness principle
Persist progress as a research artifact.
The record should distinguish:
- Verified findings.
- Rejected hypotheses.
- Open questions.
- Blockers.
- Evidence IDs.
- Next action.
flowchart LR A["Session starts"] --> B["Read research state"] B --> C["Continue active hypothesis"] C --> D["Verify or reject"] D --> E["Update research state"] E --> F["Next session resumes"]
The record is not a diary. It is a restart instruction.
Operating practice
Write a research continuation note:
| Field | Example |
|---|---|
| Objective | Explain six-week drawdown in semiconductor revision-momentum strategy |
| Verified | 62% of loss came from two high-beta short positions |
| Rejected | Broad long-book weakness does not explain drawdown |
| Open | Estimate-data revision lag for three names not yet checked |
| Blocked | Need vendor correction log for May 13 load |
| Evidence | Attribution run ATTR-2026-05-17-A, risk snapshot RISK-9912 |
| Next action | Check vendor correction log, then rerun attribution excluding affected names |
Harnessed run
The next agent reads the note and returns:
Continuation from ATTR-2026-05-17-A.
Confirmed prior finding: two high-beta shorts explain majority of drawdown.
New work:
- Checked vendor correction log for May 13.
- Found delayed estimate revision for one short name.
- Reran attribution excluding delayed record.
Status:
- Data issue explains 11% of drawdown, not primary driver.
Next:
- Test beta-neutral constraint breach during rebalance.
The investigation advances instead of restarting.
The continuation note should be written for a skeptical analyst, not for the model that wrote it. That means every important claim needs enough detail to be checked. “Two shorts explain most losses” is weaker than “two shorts explain 62% of six-week drawdown in attribution run ATTR-2026-05-17-A.” The second statement can be inspected, rerun, or challenged.
The note should also say what not to do. If a hypothesis has been rejected, the next agent should not spend the first hour retesting it unless new data invalidates the rejection. This is how persistent progress reduces both time cost and contradiction risk.
Product-agent example
Use clear state labels:
| State | Meaning |
|---|---|
verified | Measured and safe to rely on |
rejected | Tested and not supported |
open | Known question, not resolved |
blocked | Waiting on data, approval, or owner |
stale | Needs rerun due to new data or methodology |
Avoid vague words like “maybe checked” or “seems related.”
For quant teams, stale is especially important. A finding based on yesterday’s estimates may become stale after a vendor correction. A result based on quality_v2 may become stale after the factor registry moves to quality_v3. Persisted progress should include the version boundary that makes the finding valid.
The progress record should also separate research status from memo status. A drawdown explanation can be partially verified while the memo remains unready. If those states collapse into “almost done,” the next session may present work that still needs investigation.
Common mistakes
The first mistake is saving only final conclusions. Investigations need rejected hypotheses too.
The second mistake is omitting evidence IDs. Without run IDs, the next agent cannot inspect the basis.
The third mistake is leaving next action implicit. A continuation should know what to do first.
The fourth mistake is failing to mark stale findings. A new data snapshot can invalidate prior analysis.
The fifth mistake is storing progress where the next run will not look. A note in chat, a local scratchpad, and a hidden document are weaker than a predictable research-state location.
Practical exercise
Take one unfinished research investigation and write a continuation note. Include objective, verified findings, rejected hypotheses, open questions, blockers, evidence IDs, and next action.
Then hand it to a fresh session. If the fresh session repeats work, the note is not operational enough.
Key takeaways
- Chat memory is not durable research state.
- Continuation notes should separate verified, rejected, open, blocked, and stale findings.
- Evidence IDs matter in quant work.
- Rejected hypotheses prevent wasted reruns.
- The next action should be concrete.
Further reading / source notes
- Anthropic, “Effective harnesses for long-running agents” for restartable progress practices.
- OpenAI, “Harness engineering: leveraging Codex in an agent-first world” for agent environments and feedback loops.