Specify 40 min

Define the Work Surface

Turn vague quant-agent requests into bounded research work that can be completed and reviewed.

Failure pattern

A portfolio manager gives the agent a broad market objective, and the agent turns it into a confident trade thesis without a defined universe, horizon, risk boundary, or evidence standard.

The failure does not begin when the thesis is wrong. It begins when the agent is allowed to decide what the work is. In quant research, a request like “find an idea” can mean factor screen, event study, portfolio hedge, risk review, catalyst research, or memo draft. If the harness does not define the surface, the model supplies its own.

Incident: semiconductor long/short idea

Agent task

A portfolio manager asks the Quant Analyst AI Agent:

Find a long/short idea in semiconductors for next week’s research meeting.

That sounds clear to a human who knows the desk. It is not clear enough for an agent.

Available surface

The agent can inspect and use:

SurfaceWhat it contains
Equity universeUS, Europe, and Asia semiconductor names
Factor libraryMomentum, earnings revision, quality, valuation, short interest
Market dataPrices, volume, fundamentals, estimates, corporate actions
Backtest engineHistorical factor screens and pair simulations
Risk modelSector, beta, currency, style, and factor exposure
Research archivePrior investment committee notes and rejected ideas
Memo toolDrafts advisory research memos

The agent is not allowed to place orders, but the task does not explicitly say whether it may produce a trade recommendation, a hypothesis, or only a research shortlist.

Bad run

The agent screens global semiconductor names, picks one long and one short, runs a quick three-year backtest, and produces:

Advisory idea:
Long NVDA / Short INTC
Horizon: 4-6 weeks
Rationale: stronger momentum, positive revisions, better margin profile.
Expected return: +7.4 percent based on historical pair behavior.
Recommendation: add to Monday IC agenda as actionable trade.

The memo reads well. It is also not reviewable enough. The agent invented the horizon, ignored the desk’s market-neutral risk limits, mixed mega-cap and legacy semiconductor exposures without checking factor crowding, and used a short backtest without transaction-cost assumptions.

Why the harness failed

The agent was given a theme, not a work surface.

Missing boundaryConsequence
UniverseAgent mixed global semiconductor names without liquidity or region rules
Output typeAgent produced an advisory trade instead of a research shortlist
HorizonAgent invented a 4-6 week holding period
Evidence standardBacktest, risk, costs, and catalyst checks were not required
Stop conditionNo rule forced escalation when risk constraints were missing

The model optimized for a useful-looking answer. The harness did not define what “useful” meant in this research context.

Why it happens

Quant research requests are compressed. A human analyst hears “long/short idea in semiconductors” and may infer the desk’s liquidity floor, factor-neutrality preference, benchmark, holding period, and committee format. The agent does not know which of those conventions are binding unless the harness makes them visible.

The model also tends to complete the pattern of investment research. If the prompt sounds like a trade request, it may produce a trade-shaped answer. That is not the same as satisfying a research process. A harnessed quant agent should know when it is exploring, when it is drafting, when it is recommending, and when it must defer to human approval.

Harness principle

A work surface is the bounded research object the agent may operate on for one run.

For a quant analyst agent, it should define:

  • Research object: theme, universe, benchmark, account, factor, strategy, or specific question.
  • Permitted movement: read data, screen, backtest, simulate, draft, recommend, or escalate.
  • Evidence standard: required data freshness, risk checks, backtest assumptions, costs, and citations.
  • Stop conditions: missing data, invalid assumptions, risk-limit breach, or approval boundary.
flowchart LR
  A["Broad market request"] --> B["Universe and horizon"]
  B --> C["Allowed outputs"]
  C --> D["Evidence standard"]
  D --> E["Risk and approval boundaries"]
  E --> F["Reviewable research artifact"]
A quant work surface converts a broad market request into bounded research work.

The work surface does not make the agent less useful. It makes the output auditable.

Operating practice

Write a research brief before the agent begins. For the semiconductor case:

FieldHarnessed brief
SituationPrepare semiconductor long/short candidates for next week’s research meeting.
UniverseUS-listed semiconductor equities, market cap above $5B, median daily value traded above $50M.
Horizon1-3 month research horizon; no intraday or execution plan.
Allowed actionsScreen, compare factors, run bounded backtests, draft advisory memo.
Disallowed actionsDo not mark as trade-ready, do not prepare orders, do not bypass human review.
Required evidenceFactor snapshot, risk exposure, transaction-cost assumption, data timestamp, prior-idea check.
Escalation triggersMissing risk model, stale estimates, factor crowding, unavailable corporate-action adjustment.
Completion evidenceShortlist of 3 candidate pairs, evidence table, caveats, and review questions.

Harnessed run

The agent now returns:

Output type: research shortlist, not trade approval.
Universe: US semiconductors, >$5B market cap, >$50M ADV.
Candidates:
1. Long A / Short B: revision spread improving, valuation risk elevated.
2. Long C / Short D: quality spread strong, momentum decelerating.
3. Long E / Short F: catalyst pending, needs event-risk review.
Missing evidence:
- Risk model unavailable for two names after latest rebalance.
Next action:
- Human analyst should review pair 1 and approve deeper backtest.

The harnessed output is less theatrical than the bad run. It is more useful because it is explicit about scope, evidence, and approval.

Product-agent example

For quant research, a work-surface contract should distinguish output modes:

ModeAgent may doAgent may not do
ExplorationGenerate hypotheses and screensRecommend action
Advisory memoDraft thesis with caveatsMark final approval
Review packetAssemble evidence for committeeHide missing checks
Execution-adjacentPrepare scenario analysisSubmit order or final decision

The same agent can support all modes, but the harness must name the mode before the run.

Common mistakes

The first mistake is letting market themes define the task. “Semiconductors” is a theme, not a work surface.

The second mistake is treating backtest output as completion evidence without defining assumptions. A backtest without costs, data lineage, and risk checks is not enough.

The third mistake is blurring advisory and approval language. “Attractive candidate” is different from “trade-ready.”

The fourth mistake is failing to define stop conditions. A missing risk model should stop or downgrade the output, not disappear into prose.

Practical exercise

Take one real quant-agent request and rewrite it as a work-surface brief.

Include universe, horizon, allowed outputs, disallowed outputs, evidence requirements, risk checks, approval boundary, and completion evidence. Then ask whether a reviewer could detect an authority violation from the artifact alone.

Key takeaways

  • Quant-agent work needs a bounded research surface before analysis begins.
  • A theme is not a task.
  • Advisory output must remain separate from final investment approval.
  • Evidence standards should be defined before the model produces a thesis.
  • A useful agent can say “research candidate, not trade-ready.”

Further reading / source notes