6 Harness Engineering Patterns I Found in Claude Code's Source
There’s a term floating around AI engineering circles that doesn’t get enough attention: harness engineering.
A harness is the software layer between a foundation model and the real world. It’s everything except the model: the tool definitions, permission systems, system prompts, agent loops, and extension points that turn a general-purpose LLM into a reliable, scoped agent.
I recently spent time studying the Claude Code CLI source (leaked via sourcemaps in the npm package on March 31, 2026). It’s roughly 512,000 lines of TypeScript. Anthropic’s production harness for turning Claude into an autonomous coding agent with 40+ tools, 50+ slash commands, IDE integration, and multi-agent coordination.
Here are the patterns I found most worth applying to my own work.
1. The Tool Factory: Every Capability Gets the Same Contract
In Claude Code, every single thing the agent can do (reading a file, running a shell command, searching code, spawning a sub-agent) is wrapped in a uniform factory function called buildTool(). Every tool must declare:
- An input schema, validated with Zod on every call
- A permission check that runs before execution
- A read-only flag indicating whether it’s destructive
- A concurrency flag for parallel safety
- A prompt contribution explaining how the LLM should use it
- UI renderers for displaying invocation and results
This is a contract, not a suggestion. The factory enforces it. You can’t add a tool without answering these questions.
Why it matters: The #1 failure mode in agent systems is a tool getting called with unexpected arguments, or doing something destructive without a gate. The factory pattern makes it structurally impossible to ship a tool without input validation and permission checks. You’d have to actively work around the system to ship an unsafe tool.
Takeaway: If you’re building a harness, start with a buildTool() function. Even if your version only enforces inputSchema and checkPermissions, you’ve eliminated an entire class of agent failures.
2. Layered Permissions: Defense in Depth
Claude Code doesn’t have one permission check. It has four independent layers, each operating at a different stage:
| Layer | When | What |
|---|---|---|
| Build-time | Compile | Feature flags eliminate code entirely. It doesn’t exist in the binary. |
| Registry-time | Startup | Environment checks prevent tools from entering the registry. |
| Invocation-time | Each call | Per-tool checkPermissions() inspects specific arguments. |
| Rule-matching | Each call | User-defined wildcard patterns like Bash(git *). |
That last one is particularly clever. Bash(git *) means “allow any bash command that starts with git.” FileEdit(/src/*) means “allow edits to anything under src/.” Readable, composable, and controlled by the user.
Why it matters: No single layer catches everything. Build-time gates prevent shipping features you didn’t mean to. Registry gates prevent tools from existing where they shouldn’t. Invocation gates check specific arguments. Rule-matching gives the human control. Each layer catches what the others miss.
Takeaway: Even a simple harness should have two layers: which tools exist at all, and per-call argument inspection. The wildcard rule pattern is worth stealing wholesale.
3. Dynamic Prompt Assembly: The Prompt Is a Pipeline, Not a String
The system prompt in Claude Code isn’t a static block of text. It’s dynamically assembled from roughly 10 sources at runtime:
- Static core instructions (who the agent is, how to behave)
- Model-specific variations (different rules for Opus vs. Haiku)
- Tool descriptions (auto-generated from tool schemas)
- OS/shell/git context (what environment the agent is running in)
- Memory files (project and user preferences from
CLAUDE.md) - Permission mode (what the agent can and can’t do right now)
- Feature-specific instructions (voice mode, plan mode, etc.)
The most interesting part: tools contribute to their own system prompt. Each tool has a prompt() method that explains to the LLM how to use it correctly. The BashTool’s prompt tells Claude about quoting, timeouts, and when to use dedicated tools instead. The FileEditTool’s prompt explains search-and-replace semantics.
Why it matters: Static prompts drift from reality. When you change a tool’s behavior but forget to update the system prompt, the agent gets confused. Co-locating prompt fragments with the code they describe solves this. When the tool changes, its prompt description changes with it.
Takeaway: Build a prompt assembly pipeline. Let tools describe themselves. Inject runtime context (OS, git status, project type). Layer user preferences on top. The final prompt should be composed, not authored.
4. The Self-Correcting Agent Loop
The core of Claude Code is a QueryEngine, a ~46,000-line state machine that implements the agent loop:
User message → API call → Parse response
→ tool_use? → validate → permissions → execute → inject result → loop
→ end_turn? → done
→ error? → classify → retry or teach
The critical design choice: when tool input validation fails, the error is sent back to the LLM instead of crashing. The model sees “Zod validation error: expected string, got number for parameter file_path” and corrects its call on the next attempt. Same for permission denials. The model gets “permission denied for rm -rf /” and tries a different approach.
This turns what would be a hard failure into a self-correction opportunity. In practice, models fix their own tool calls on the second attempt the vast majority of the time.
Why it matters: Without self-correction, every schema mismatch or permission denial crashes the agent session. With it, the agent recovers gracefully. This is the single highest-ROI pattern in the entire codebase.
Takeaway: For every failure in your agent loop, ask: crash, retry, or teach? Network errors: retry. Validation errors: teach (send the error back). Permission denials: teach. Catastrophic failures: crash with a clear message. Default to “teach” whenever possible.
5. Skills: Prompt Recipes as First-Class Citizens
Claude Code separates tools (primitives like “execute shell command” or “edit file”) from skills/commands (curated workflows like /commit, /review, /debug).
A skill is a recipe that combines:
- A subset of tools (only the ones relevant to this task)
- A curated prompt with rich, task-specific context
- Constraints on what the agent can do
For example, the /commit command restricts available tools to Bash(git *) and FileRead(*). No file editing, no web access. It gathers the current diff, recent commit messages, and staged changes. Then it sends a prompt that says “create a commit message following the style of recent commits.”
The agent gets a narrowly scoped task with exactly the tools and context it needs. It can’t go off on tangents because the tools for tangents aren’t available.
Why it matters: Raw tools are capabilities. Skills are expertise. A general-purpose agent with 40 tools is powerful but unfocused. That same agent with a curated skill for “code review,” scoped to only reading tools, injected with the diff and project conventions, is an expert.
Takeaway: Separate tools from skills. Tools are primitives. Skills combine tools + prompts + constraints into reusable task recipes. Build your skills library over time as you discover repeating workflows.
6. Incremental Build-Out Prompts
The repo includes 16 numbered prompts in a prompts/ directory, designed to be run in order in separate AI chat sessions to progressively build out the harness:
01 - Runtime + dependencies
02 - Compatibility shims
03 - Build system
...
07 - Tool system (audit all 40 tools)
09 - Agent loop
10 - Prompt assembly + memory
...
16 - Testing
Each prompt is self-contained (paste it into a fresh session), independently verifiable (tells you how to confirm it worked), and dependency-aware (declares what must come first).
Why it matters: Complex harnesses fail when you try to build everything at once. You can’t debug the agent loop if the tools don’t work. You can’t debug prompt assembly if the agent loop doesn’t work. The progressive approach makes sure each layer is solid before building on it.
Takeaway: When building a harness, write a sequence of build-out prompts. Start with the foundation (can the code run?), add tools, build the agent loop, then layer on prompt assembly. Each step gets a verification checkpoint. This is TDD for harness engineering.
The Meta-Pattern
Across all six patterns, one principle dominates: separation of concerns at every level.
- Tools don’t know about permissions. The factory enforces them.
- Prompts don’t know about tools. Tools describe themselves.
- Skills don’t know about the agent loop. The query engine orchestrates.
- The build system doesn’t know about features. Feature flags handle it.
This is what makes the system maintainable at 500K+ lines. For smaller harnesses, establishing these boundaries early prevents the “everything knows about everything” entropy that kills agent projects before they mature.
Where to Start
If you’re building an AI agent harness, implement in this order:
- Tool factory with
inputSchema+checkPermissions+isReadOnlyon every tool - Agent loop as an explicit state machine with self-correction on errors
- Prompt pipeline where tools describe themselves and context is injected at runtime
- Permission rules with user-configurable wildcards for access control
- Skills layer with curated recipes for common workflows
You don’t need 512,000 lines. You need the right boundaries.
I’m an AI consultant who builds agent systems and harnesses for businesses. If you’re working on agent infrastructure and want help applying these patterns, get in touch.