Equip Safe Tools · Building Harness

Failure pattern

The coding agent has a generic shell. It can run any command, mutate anything, and decide by itself which output counts as evidence. That is convenient, but it is not a harness.

Reproduce the failure

const runShell = async (command: string) => exec(command);

await runShell("pnpm db:reset && pnpm test && git add .");

The model may be trying to help, but the tool surface is too broad. It can destroy local state, run unfocused checks, or stage unrelated files.

Successful Anvia pattern

Expose typed tools that match safe coding-agent operations. Use createTool to make each action explicit.

import { z } from "zod";
import { AgentBuilder, createTool } from "@anvia/core";

const readRepoFile = createTool({
  name: "read_repo_file",
  description: "Read a repository file by project-relative path.",
  input: z.object({
    path: z.string(),
  }),
  output: z.object({
    path: z.string(),
    text: z.string(),
  }),
  execute: async ({ path }) => repo.readFile(path),
});

const runFocusedCheck = createTool({
  name: "run_focused_check",
  description: "Run an approved focused verification command.",
  input: z.object({
    command: z.enum(["pnpm test invite", "pnpm typecheck", "pnpm lint"]),
  }),
  output: z.object({
    command: z.string(),
    exitCode: z.number(),
    output: z.string(),
  }),
  execute: async ({ command }) => checks.run(command),
});

const requestPatchReview = createTool({
  name: "request_patch_review",
  description: "Submit a bounded patch summary for human review.",
  input: z.object({
    behavior: z.string(),
    changedFiles: z.array(z.string()),
    evidence: z.array(z.string()),
    risks: z.array(z.string()),
  }),
  approval: {
    when: () => true,
    reason: "Human review is required before a coding-agent patch is accepted.",
    rejectMessage: "Review request was not submitted. Human approval is required.",
  },
  execute: async (input) => reviewQueue.submit(input),
});

const agent = new AgentBuilder("repo-coding-agent", model)
  .tool(readRepoFile)
  .tool(runFocusedCheck)
  .tool(requestPatchReview)
  .defaultMaxTurns(6)
  .build();

Why it succeeds

The agent can inspect the repo and produce evidence, but it cannot run arbitrary shell commands or claim approval. The safe path is now the easy path.

Success check

The successful tool surface has:

no generic shell tool for normal runs
focused commands with named purpose
typed outputs that become evidence
approval for review submission
refusal paths for unsafe or unsupported actions

Next move

After tools are safe, compose them into a pipeline so the agent cannot skip preflight or verification.