Equip 42 min

Design the Agent Interface

Treat quant tools, datasets, permissions, and feedback as the interface your agent must operate through.

Failure pattern

The quant agent is given powerful research tools, but the interface hides what each tool assumes, what it changes, what evidence it returns, and which outputs require review.

The agent can pull data, run backtests, simulate portfolios, and draft memos. That sounds useful until the tools are raw internal functions with vague names and loose outputs. A capable model can still misuse a tool if the interface does not make safe use obvious.

Incident: raw backtest and memo tools

Agent task

A senior analyst asks:

Test whether earnings-revision momentum still works in semiconductors and prepare a short IC memo.

The agent has the right general capability. The unsafe part is the interface.

Available surface

The agent sees these tools:

ToolDescription shown to agent
pullData(query)Pulls market data
runBacktest(config)Runs a backtest
simulatePortfolio(payload)Simulates positions
writeMemo(text)Writes a memo
publishIdea(id)Publishes idea to research board

The tools expose mechanics, not research semantics. runBacktest does not say whether it uses point-in-time data, transaction costs, delisting returns, borrow constraints, or survivorship-bias controls. publishIdea sounds like writing to a board, but analysts treat the board as pre-committee circulation.

Bad run

The agent calls:

pullData("semiconductors earnings revision 5y")
runBacktest({ factor: "eps_revision_3m", sector: "semis", years: 5 })
writeMemo("Earnings revision momentum remains attractive...")
publishIdea("semi_revision_momentum")

The memo says the signal has a 1.8 Sharpe. Review finds the backtest used the current semiconductor universe across the full history, ignored transaction costs, and excluded delisted names. The published idea reaches the research board before risk review.

Why the harness failed

The interface exposed power without meaning.

Interface gapConsequence
Read surface vagueAgent did not know which dataset snapshot or universe was used
Action surface broadRaw backtest accepted incomplete configuration
Feedback weakResult returned Sharpe without warnings or assumptions
Policy hiddenPublishing did not require review status
Output schema looseMemo omitted costs, data lineage, and caveats

The tools worked. The interface failed.

Why it happens

Internal research tools are often built for humans who already know the desk’s conventions. A human understands that a quick backtest is exploratory, that a published idea implies review, and that a Sharpe without costs is not enough. An agent reads the interface literally.

Harness engineering for a quant agent means designing the agent’s operating surface: what it can read, what it can do, what feedback it receives, and what policies are enforced at the interface.

Harness principle

The agent interface has four surfaces:

SurfaceQuant version
Read surfaceDatasets, factor definitions, risk model, constraints, prior memos
Action surfaceScreens, bounded backtests, simulations, memo drafts, review requests
Feedback surfaceAssumptions, warnings, diagnostics, validation checks, result metadata
Policy surfaceApproval gates, prohibited outputs, data-use rules, compliance constraints
flowchart LR
  A["Read surface"] --> E["Agent decision"]
  B["Policy surface"] --> E
  E --> C["Action surface"]
  C --> D["Feedback surface"]
  D --> E
  B --> C
A quant agent interface combines data, tools, feedback, and policy into one usable surface.

The best interface does not rely on the agent remembering every rule. It shapes the path of action.

Operating practice

Wrap raw tools in research-specific tools:

Raw toolAgent-facing tool
pullData(query)loadResearchDataset(universe, snapshot, requiredFields)
runBacktest(config)runBoundedBacktest(strategySpec, costModel, biasControls)
simulatePortfolio(payload)simulateRiskConstrainedPortfolio(candidateSet, constraints)
writeMemo(text)draftResearchMemo(evidencePacket, caveats)
publishIdea(id)requestIdeaReview(memoId, evidenceChecklist)

The harnessed backtest tool should return:

status: completed_with_warnings
data_snapshot: 2026-05-16
universe: US semis, point-in-time constituents
cost_model: 20 bps one-way
bias_controls: survivorship adjusted, delisting returns included
warnings:
- borrow cost unavailable for 4 short candidates
- corporate-action adjustment pending for one name

Now the agent can reason from tool feedback instead of treating a single performance metric as truth.

The interface should also make review state explicit. A memo draft, a review request, and an approved idea are different product states. The agent should be able to create the first two under the right conditions, but not silently move an idea into final approval. That boundary belongs in the action surface, not only in written policy.

For example, requestIdeaReview should require an evidence checklist. If data lineage, cost assumptions, and risk output are missing, the tool should refuse the request and return the missing fields. This turns a compliance rule into operational feedback the agent can use during the run.

Product-agent example

Design each research tool with input constraints and output evidence:

ToolRequired inputRequired output
Screenuniverse, date, factor versionranked names, missing data, timestamp
Backtestsignal, horizon, costs, universemetrics, assumptions, warnings
Risk simulationcandidate weights, constraintsexposures, breaches, sensitivities
Memo draftevidence packetthesis, caveats, open questions
Review requestmemo and checklistreviewer, status, blockers

The interface should make the safe path easier than the shortcut.

The same pattern applies to read tools. loadResearchDataset should not simply return rows. It should return the snapshot ID, vendor, timestamp, point-in-time status, missing fields, and corporate-action adjustment status. Those metadata fields become part of the agent’s reasoning surface. Without them, the agent may treat any returned data as equally trustworthy.

Common mistakes

The first mistake is exposing raw internal tools directly. Humans may know their assumptions; agents need assumptions in the output.

The second mistake is returning metrics without diagnostics. A Sharpe without data lineage, costs, and warnings invites overconfidence.

The third mistake is allowing publication verbs too early. A quant agent should request review before an idea is treated as committee-ready.

The fourth mistake is treating policy as documentation only. Approval gates should be enforced in the action path.

Practical exercise

Choose one quant research tool and redesign it as an agent-facing interface. Write the tool name, safe inputs, refused inputs, structured outputs, warnings, and approval behavior.

Then ask: what incorrect action does the current interface invite? The answer tells you where the harness is weak.

Key takeaways

  • The agent’s interface is the research environment, not the chat box.
  • Quant tools need semantics, assumptions, and warnings.
  • Feedback is part of the interface.
  • Publication or review-state changes should be gated.
  • Safe tools make good research behavior the default path.

Further reading / source notes