Equip Safe Tools · Building Harness

Failure pattern

Coding agent punya generic shell. Ia bisa menjalankan command apa pun, mutate state apa pun, dan menentukan sendiri output mana yang dianggap evidence. Praktis, tetapi bukan harness.

Reproduce the failure

const runShell = async (command: string) => exec(command);

await runShell("pnpm db:reset && pnpm test && git add .");

Model mungkin hanya ingin membantu, tetapi tool surface terlalu luas.

Successful Anvia pattern

Expose typed tools yang sesuai dengan operasi coding-agent yang aman.

import { z } from "zod";
import { AgentBuilder, createTool } from "@anvia/core";

const readRepoFile = createTool({
  name: "read_repo_file",
  description: "Read a repository file by project-relative path.",
  input: z.object({ path: z.string() }),
  output: z.object({
    path: z.string(),
    text: z.string(),
  }),
  execute: async ({ path }) => repo.readFile(path),
});

const runFocusedCheck = createTool({
  name: "run_focused_check",
  description: "Run an approved focused verification command.",
  input: z.object({
    command: z.enum(["pnpm test invite", "pnpm typecheck", "pnpm lint"]),
  }),
  output: z.object({
    command: z.string(),
    exitCode: z.number(),
    output: z.string(),
  }),
  execute: async ({ command }) => checks.run(command),
});

const requestPatchReview = createTool({
  name: "request_patch_review",
  description: "Submit a bounded patch summary for human review.",
  input: z.object({
    behavior: z.string(),
    changedFiles: z.array(z.string()),
    evidence: z.array(z.string()),
    risks: z.array(z.string()),
  }),
  approval: {
    when: () => true,
    reason: "Human review is required before a coding-agent patch is accepted.",
    rejectMessage: "Review request was not submitted. Human approval is required.",
  },
  execute: async (input) => reviewQueue.submit(input),
});

const agent = new AgentBuilder("repo-coding-agent", model)
  .tool(readRepoFile)
  .tool(runFocusedCheck)
  .tool(requestPatchReview)
  .defaultMaxTurns(6)
  .build();

Why it succeeds

Agent bisa inspect repo dan menghasilkan evidence, tetapi tidak bisa menjalankan arbitrary shell atau mengklaim approval. Safe path menjadi path yang paling mudah.

Success check

Tool surface yang benar tidak punya generic shell untuk normal run, punya focused commands, typed outputs, approval untuk review submission, dan refusal path untuk aksi berisiko.