Limit Active Work

Failure pattern

A request to “improve the strategy” causes the agent to change universe filters, factor weights, risk constraints, rebalance cadence, and memo framing at the same time.

The run looks productive because many things changed. The research is worse because no one can tell which change helped, which change hurt, and which interaction created the result.

In quant research, uncontrolled active work turns analysis into accidental curve fitting.

Incident: strategy improvement spiral

Agent task

A portfolio manager says:

Improve the semiconductor long/short screen. The latest backtest is weak after 2023.

The agent sees this as a broad optimization problem.

Available surface

The agent can change:

Surface	Possible changes
Universe filter	Market cap, liquidity, region, subsector
Factor weights	Revision momentum, quality, valuation, short interest
Risk constraints	Beta, sector, country, factor exposure
Rebalance cadence	Weekly, monthly, event-driven
Cost model	Spread, borrow, slippage assumptions
Memo framing	Thesis, caveats, charts, recommendation language

No state model defines what is active, queued, blocked, or verified.

Bad run

The agent changes five things:

- Removes small-cap names under $10B.
- Increases revision momentum weight from 30% to 50%.
- Adds quality floor.
- Changes rebalance from monthly to weekly.
- Rewrites memo to emphasize near-term estimate revisions.

The new backtest improves after 2023. But review finds that turnover doubled, borrow constraints worsened, and the improvement mostly came from excluding weak small-cap shorts. The team cannot attribute the performance change because too many variables moved.

Why the harness failed

The harness allowed many active behaviors.

Missing control	Consequence
One active item	Agent changed filters, factors, cadence, and memo together
Verification per change	No evidence linked to each modification
Blocked state	Borrow-cost question was bypassed
Deferred list	Memo improvements mixed with strategy changes
Attribution rule	Performance improvement could not be explained

The agent did not optimize a strategy. It opened too many fronts.

Why it happens

Agents are good at association. If the backtest is weak, universe, weights, costs, cadence, and narrative all feel related. A human researcher may explore several hypotheses privately, but a reviewable research process needs controlled changes.

Quant work especially needs isolation. A strategy result is only meaningful if the team knows what changed. Without active-work limits, the agent can accidentally produce a better-looking backtest that cannot survive review.

Harness principle

Limit active work to one research behavior at a time.

stateDiagram-v2
  [*] --> Queued
  Queued --> Active: selected hypothesis
  Active --> Verified: evidence passes
  Active --> Blocked: missing data or decision
  Blocked --> Active: unblocked
  Verified --> [*]

A work-in-progress limit keeps strategy iteration attributable.

A large research goal can still proceed through many steps. The rule is that each run has one active behavior and one evidence standard.

Operating practice

Turn “improve the strategy” into a queue:

Item	State	Evidence
Test whether small-cap exclusion explains post-2023 weakness	active	Backtest with only universe floor changed
Test revision momentum weight from 30% to 50%	queued	Same universe, same cadence, same costs
Evaluate weekly rebalance	queued	Turnover and cost-adjusted return comparison
Add borrow-cost constraint	blocked	Borrow dataset coverage check
Rewrite memo	deferred	Only after strategy evidence is stable

The active item has a tight contract:

Field	Example
Behavior	Determine whether small-cap exclusion improves post-2023 performance
Allowed changes	Market-cap floor only
Excluded changes	Factor weights, rebalance cadence, memo language, risk constraints
Evidence	Before/after metrics, turnover, exposure changes, affected names

Harnessed run

The agent returns:

Active hypothesis: small-cap exclusion.
Only change: market-cap floor from $5B to $10B.
Result: post-2023 Sharpe improves from 0.42 to 0.71.
Cost impact: turnover unchanged.
Exposure change: short book loses 6 high-borrow names.
Caveat: improvement may reflect borrow constraint, not size alone.
Next item: evaluate explicit borrow-cost constraint.

This output is slower than broad optimization. It is also useful because the team can attribute the result.

The harness should preserve the baseline for every active item. In quant research, “improved” only has meaning relative to a named baseline. The active item should identify the previous strategy version, dataset snapshot, factor versions, and cost model. If those change during the run, the run is no longer testing one behavior.

This does not prevent exploration. The agent can still discover that borrow cost, liquidity, and rebalance cadence deserve attention. It simply records those as queued or blocked hypotheses instead of folding them into the current result.

Product-agent example

A quant active-work board should track research hypotheses, not files:

Bad item	Better item
Update strategy	Test small-cap exclusion
Improve factors	Compare revision weight change only
Fix risk	Evaluate beta-neutral constraint breach
Clean memo	Draft caveats after evidence passes

Behavior-sized work is easier to verify and easier to reject.

A good active-work board also protects the memo. Narrative should follow evidence. If the agent changes the story while the research is still moving, it may hide unresolved uncertainty. Treat memo framing as its own work item unless wording is required to describe the active evidence.

Reviewers should be able to ask one simple question: “What changed in this run?” If the answer contains more than one research behavior, the harness should reject the run as too broad or split it into separate artifacts.

Common mistakes

The first mistake is optimizing multiple knobs in one run. That produces performance, not evidence.

The second mistake is treating memo edits as harmless. Narrative changes can hide uncertainty and should wait until evidence is stable.

The third mistake is allowing blocked data to become guessed assumptions. If borrow data is missing, the item is blocked.

The fourth mistake is marking a strategy better without attribution. Better compared to what exact baseline?

Practical exercise

Take one strategy-improvement request and split it into five hypotheses. For each, write the one allowed change and the evidence required.

Then pick exactly one active hypothesis. Anything else the agent discovers should become queued, blocked, or deferred.

Key takeaways

Active-work limits protect research attribution.
Strategy changes should move one behavior at a time.
Better backtest results are not enough if the cause is unclear.
Blocked data should not become guessed assumptions.
Deferred findings are useful, but they are not current scope.