Define the Work Surface

Failure pattern

A portfolio manager gives the agent a broad market objective, and the agent turns it into a confident trade thesis without a defined universe, horizon, risk boundary, or evidence standard.

The failure does not begin when the thesis is wrong. It begins when the agent is allowed to decide what the work is. In quant research, a request like “find an idea” can mean factor screen, event study, portfolio hedge, risk review, catalyst research, or memo draft. If the harness does not define the surface, the model supplies its own.

Incident: semiconductor long/short idea

Agent task

A portfolio manager asks the Quant Analyst AI Agent:

Find a long/short idea in semiconductors for next week’s research meeting.

That sounds clear to a human who knows the desk. It is not clear enough for an agent.

Available surface

The agent can inspect and use:

Surface	What it contains
Equity universe	US, Europe, and Asia semiconductor names
Factor library	Momentum, earnings revision, quality, valuation, short interest
Market data	Prices, volume, fundamentals, estimates, corporate actions
Backtest engine	Historical factor screens and pair simulations
Risk model	Sector, beta, currency, style, and factor exposure
Research archive	Prior investment committee notes and rejected ideas
Memo tool	Drafts advisory research memos

The agent is not allowed to place orders, but the task does not explicitly say whether it may produce a trade recommendation, a hypothesis, or only a research shortlist.

Bad run

The agent screens global semiconductor names, picks one long and one short, runs a quick three-year backtest, and produces:

Advisory idea:
Long NVDA / Short INTC
Horizon: 4-6 weeks
Rationale: stronger momentum, positive revisions, better margin profile.
Expected return: +7.4 percent based on historical pair behavior.
Recommendation: add to Monday IC agenda as actionable trade.

The memo reads well. It is also not reviewable enough. The agent invented the horizon, ignored the desk’s market-neutral risk limits, mixed mega-cap and legacy semiconductor exposures without checking factor crowding, and used a short backtest without transaction-cost assumptions.

Why the harness failed

The agent was given a theme, not a work surface.

Missing boundary	Consequence
Universe	Agent mixed global semiconductor names without liquidity or region rules
Output type	Agent produced an advisory trade instead of a research shortlist
Horizon	Agent invented a 4-6 week holding period
Evidence standard	Backtest, risk, costs, and catalyst checks were not required
Stop condition	No rule forced escalation when risk constraints were missing

The model optimized for a useful-looking answer. The harness did not define what “useful” meant in this research context.

Why it happens

Quant research requests are compressed. A human analyst hears “long/short idea in semiconductors” and may infer the desk’s liquidity floor, factor-neutrality preference, benchmark, holding period, and committee format. The agent does not know which of those conventions are binding unless the harness makes them visible.

The model also tends to complete the pattern of investment research. If the prompt sounds like a trade request, it may produce a trade-shaped answer. That is not the same as satisfying a research process. A harnessed quant agent should know when it is exploring, when it is drafting, when it is recommending, and when it must defer to human approval.

Harness principle

A work surface is the bounded research object the agent may operate on for one run.

For a quant analyst agent, it should define:

Research object: theme, universe, benchmark, account, factor, strategy, or specific question.
Permitted movement: read data, screen, backtest, simulate, draft, recommend, or escalate.
Evidence standard: required data freshness, risk checks, backtest assumptions, costs, and citations.
Stop conditions: missing data, invalid assumptions, risk-limit breach, or approval boundary.

flowchart LR
  A["Broad market request"] --> B["Universe and horizon"]
  B --> C["Allowed outputs"]
  C --> D["Evidence standard"]
  D --> E["Risk and approval boundaries"]
  E --> F["Reviewable research artifact"]

A quant work surface converts a broad market request into bounded research work.

The work surface does not make the agent less useful. It makes the output auditable.

Operating practice

Write a research brief before the agent begins. For the semiconductor case:

Field	Harnessed brief
Situation	Prepare semiconductor long/short candidates for next week’s research meeting.
Universe	US-listed semiconductor equities, market cap above `$5B`, median daily value traded above `$50M`.
Horizon	1-3 month research horizon; no intraday or execution plan.
Allowed actions	Screen, compare factors, run bounded backtests, draft advisory memo.
Disallowed actions	Do not mark as trade-ready, do not prepare orders, do not bypass human review.
Required evidence	Factor snapshot, risk exposure, transaction-cost assumption, data timestamp, prior-idea check.
Escalation triggers	Missing risk model, stale estimates, factor crowding, unavailable corporate-action adjustment.
Completion evidence	Shortlist of 3 candidate pairs, evidence table, caveats, and review questions.

Harnessed run

The agent now returns:

Output type: research shortlist, not trade approval.
Universe: US semiconductors, >$5B market cap, >$50M ADV.
Candidates:
1. Long A / Short B: revision spread improving, valuation risk elevated.
2. Long C / Short D: quality spread strong, momentum decelerating.
3. Long E / Short F: catalyst pending, needs event-risk review.
Missing evidence:
- Risk model unavailable for two names after latest rebalance.
Next action:
- Human analyst should review pair 1 and approve deeper backtest.

The harnessed output is less theatrical than the bad run. It is more useful because it is explicit about scope, evidence, and approval.

Product-agent example

For quant research, a work-surface contract should distinguish output modes:

Mode	Agent may do	Agent may not do
Exploration	Generate hypotheses and screens	Recommend action
Advisory memo	Draft thesis with caveats	Mark final approval
Review packet	Assemble evidence for committee	Hide missing checks
Execution-adjacent	Prepare scenario analysis	Submit order or final decision

The same agent can support all modes, but the harness must name the mode before the run.

Common mistakes

The first mistake is letting market themes define the task. “Semiconductors” is a theme, not a work surface.

The second mistake is treating backtest output as completion evidence without defining assumptions. A backtest without costs, data lineage, and risk checks is not enough.

The third mistake is blurring advisory and approval language. “Attractive candidate” is different from “trade-ready.”

The fourth mistake is failing to define stop conditions. A missing risk model should stop or downgrade the output, not disappear into prose.

Practical exercise

Take one real quant-agent request and rewrite it as a work-surface brief.

Include universe, horizon, allowed outputs, disallowed outputs, evidence requirements, risk checks, approval boundary, and completion evidence. Then ask whether a reviewer could detect an authority violation from the artifact alone.

Key takeaways

Quant-agent work needs a bounded research surface before analysis begins.
A theme is not a task.
Advisory output must remain separate from final investment approval.
Evidence standards should be defined before the model produces a thesis.
A useful agent can say “research candidate, not trade-ready.”