Design the Agent Interface

Failure pattern

The quant agent is given powerful research tools, but the interface hides what each tool assumes, what it changes, what evidence it returns, and which outputs require review.

The agent can pull data, run backtests, simulate portfolios, and draft memos. That sounds useful until the tools are raw internal functions with vague names and loose outputs. A capable model can still misuse a tool if the interface does not make safe use obvious.

Incident: raw backtest and memo tools

Agent task

A senior analyst asks:

Test whether earnings-revision momentum still works in semiconductors and prepare a short IC memo.

The agent has the right general capability. The unsafe part is the interface.

Available surface

The agent sees these tools:

Tool	Description shown to agent
`pullData(query)`	Pulls market data
`runBacktest(config)`	Runs a backtest
`simulatePortfolio(payload)`	Simulates positions
`writeMemo(text)`	Writes a memo
`publishIdea(id)`	Publishes idea to research board

The tools expose mechanics, not research semantics. runBacktest does not say whether it uses point-in-time data, transaction costs, delisting returns, borrow constraints, or survivorship-bias controls. publishIdea sounds like writing to a board, but analysts treat the board as pre-committee circulation.

Bad run

The agent calls:

pullData("semiconductors earnings revision 5y")
runBacktest({ factor: "eps_revision_3m", sector: "semis", years: 5 })
writeMemo("Earnings revision momentum remains attractive...")
publishIdea("semi_revision_momentum")

The memo says the signal has a 1.8 Sharpe. Review finds the backtest used the current semiconductor universe across the full history, ignored transaction costs, and excluded delisted names. The published idea reaches the research board before risk review.

Why the harness failed

The interface exposed power without meaning.

Interface gap	Consequence
Read surface vague	Agent did not know which dataset snapshot or universe was used
Action surface broad	Raw backtest accepted incomplete configuration
Feedback weak	Result returned Sharpe without warnings or assumptions
Policy hidden	Publishing did not require review status
Output schema loose	Memo omitted costs, data lineage, and caveats

The tools worked. The interface failed.

Why it happens

Internal research tools are often built for humans who already know the desk’s conventions. A human understands that a quick backtest is exploratory, that a published idea implies review, and that a Sharpe without costs is not enough. An agent reads the interface literally.

Harness engineering for a quant agent means designing the agent’s operating surface: what it can read, what it can do, what feedback it receives, and what policies are enforced at the interface.

Harness principle

The agent interface has four surfaces:

Surface	Quant version
Read surface	Datasets, factor definitions, risk model, constraints, prior memos
Action surface	Screens, bounded backtests, simulations, memo drafts, review requests
Feedback surface	Assumptions, warnings, diagnostics, validation checks, result metadata
Policy surface	Approval gates, prohibited outputs, data-use rules, compliance constraints

flowchart LR
  A["Read surface"] --> E["Agent decision"]
  B["Policy surface"] --> E
  E --> C["Action surface"]
  C --> D["Feedback surface"]
  D --> E
  B --> C

A quant agent interface combines data, tools, feedback, and policy into one usable surface.

The best interface does not rely on the agent remembering every rule. It shapes the path of action.

Operating practice

Wrap raw tools in research-specific tools:

Raw tool	Agent-facing tool
`pullData(query)`	`loadResearchDataset(universe, snapshot, requiredFields)`
`runBacktest(config)`	`runBoundedBacktest(strategySpec, costModel, biasControls)`
`simulatePortfolio(payload)`	`simulateRiskConstrainedPortfolio(candidateSet, constraints)`
`writeMemo(text)`	`draftResearchMemo(evidencePacket, caveats)`
`publishIdea(id)`	`requestIdeaReview(memoId, evidenceChecklist)`

The harnessed backtest tool should return:

status: completed_with_warnings
data_snapshot: 2026-05-16
universe: US semis, point-in-time constituents
cost_model: 20 bps one-way
bias_controls: survivorship adjusted, delisting returns included
warnings:
- borrow cost unavailable for 4 short candidates
- corporate-action adjustment pending for one name

Now the agent can reason from tool feedback instead of treating a single performance metric as truth.

The interface should also make review state explicit. A memo draft, a review request, and an approved idea are different product states. The agent should be able to create the first two under the right conditions, but not silently move an idea into final approval. That boundary belongs in the action surface, not only in written policy.

For example, requestIdeaReview should require an evidence checklist. If data lineage, cost assumptions, and risk output are missing, the tool should refuse the request and return the missing fields. This turns a compliance rule into operational feedback the agent can use during the run.

Product-agent example

Design each research tool with input constraints and output evidence:

Tool	Required input	Required output
Screen	universe, date, factor version	ranked names, missing data, timestamp
Backtest	signal, horizon, costs, universe	metrics, assumptions, warnings
Risk simulation	candidate weights, constraints	exposures, breaches, sensitivities
Memo draft	evidence packet	thesis, caveats, open questions
Review request	memo and checklist	reviewer, status, blockers

The interface should make the safe path easier than the shortcut.

The same pattern applies to read tools. loadResearchDataset should not simply return rows. It should return the snapshot ID, vendor, timestamp, point-in-time status, missing fields, and corporate-action adjustment status. Those metadata fields become part of the agent’s reasoning surface. Without them, the agent may treat any returned data as equally trustworthy.

Common mistakes

The first mistake is exposing raw internal tools directly. Humans may know their assumptions; agents need assumptions in the output.

The second mistake is returning metrics without diagnostics. A Sharpe without data lineage, costs, and warnings invites overconfidence.

The third mistake is allowing publication verbs too early. A quant agent should request review before an idea is treated as committee-ready.

The fourth mistake is treating policy as documentation only. Approval gates should be enforced in the action path.

Practical exercise

Choose one quant research tool and redesign it as an agent-facing interface. Write the tool name, safe inputs, refused inputs, structured outputs, warnings, and approval behavior.

Then ask: what incorrect action does the current interface invite? The answer tells you where the harness is weak.

Key takeaways

The agent’s interface is the research environment, not the chat box.
Quant tools need semantics, assumptions, and warnings.
Feedback is part of the interface.
Publication or review-state changes should be gated.
Safe tools make good research behavior the default path.