Mastering Agent Engineering: A Comprehensive Workflow for Productivity
This guide details a robust agent engineering workflow designed for maximum productivity and enjoyment, moving beyond basic demos to real-world application. The author, Kun, a former principal engineer at Meta, Microsoft, and Atlassian, shares insights gained from building and shipping dozens of production-ready coding agents daily. The core philosophy emphasizes a terminal-centric approach, efficient use of memory and skills, and strategic delegation to a growing "crew" of AI agents.
Why I Work in the Terminal
A fundamental aspect of this workflow is the heavy reliance on the terminal. While graphical user interfaces (GUIs) offer richer visuals, the terminal provides two critical advantages for productivity:
- Uninterrupted Flow: Keeping hands on the keyboard minimizes context switching. Constantly moving to a mouse breaks concentration and disrupts the creative flow. Terminal applications are inherently keyboard-driven, ensuring hands remain in one place.
- Ubiquitous Access: The terminal allows for a consistent workflow across all devices, including mobile. This means the same powerful tools and environments are accessible anywhere.
For those who prefer GUIs, the concepts discussed remain applicable, focusing on the underlying principles of agent interaction rather than specific interface mechanics.
WezTerm and My Lua Config
The terminal emulator of choice is WezTerm. Its key strengths include:
- Cross-Platform Consistency: WezTerm functions identically on Windows, macOS, and Linux, a significant advantage for developers working across different operating systems.
- High Customizability: It can be configured extensively using Lua scripts, allowing for dynamic and personalized settings. Changes can be hot-reloaded instantly, making configuration a seamless process.
What is tmux?
Within WezTerm, tmux (terminal multiplexer) is essential for managing multiple agent sessions. tmux allows users to:
- Split Panes: Divide the terminal into multiple independent panes, enabling simultaneous viewing of an agent's output, an editor, and command execution.
- Create Tabs (Windows): Organize multiple agent sessions into separate tabs, facilitating parallel work.
- Persistent Sessions: tmux sessions persist even if the terminal is closed or the connection is lost. This means work can be resumed exactly where it left off, even from a different device.
While tmux offers immense power, default configurations can be basic. Customizing tmux with keybindings and styling is recommended for an optimal experience.
Neovim as Editor
The preferred text editor is Neovim, a modern evolution of Vim. Neovim, like Vim, is designed for keyboard-centric operation:
- Efficient Navigation: Movement, scrolling, and editing are performed using a vast array of keyboard shortcuts.
- Modal Editing: Neovim operates in different modes (e.g., insert mode for typing, normal mode for navigation and commands), allowing for precise control.
- Plugin Ecosystem: A rich plugin system extends Neovim's capabilities, enabling features like fuzzy file finding and code searching directly within the editor.
While Neovim has a learning curve, mastering it leads to significantly faster and more fluid code editing.
Agent Harnesses
To interact with AI models, an "agent harness" is required. The author regularly uses four:
- Cloud Code (Anthropic): A practical choice for Anthropic's models, offering a sensible default experience and a rich feature set, though it can sometimes be buggy and less customizable.
- Codex COI: Written in Rust, it's known for its speed and open-source nature, allowing for self-inspection and workarounds. It's less feature-rich and customizable.
- Pi Coding Agent: Emphasizes minimalism and extensibility, ideal for users who want to tinker and customize extensively.
- Open Code: Features a smooth UI, broad model integration, and a more complete feature set than Pi, making it a good "grab-and-go" model-agnostic option.
For this walkthrough, Cloud Code is used for familiarity, but the workflow is designed to be model-agnostic, acknowledging the rapidly evolving AI landscape.
My Global Memory File
Agents, as fresh recruits, need onboarding. This is achieved through memory files and skills.
- Global Memory File: This file, located in a specific path for each harness (e.g.,
~/.cloudcode/config/memory.md), is loaded into the system prompt of every agent session. It's kept minimal (around 27 lines) to avoid token waste. It stores personal preferences, such as:- Avoiding em-dashes in favor of plain dashes.
- A crucial instruction: "When making technical decisions, don't give too much weight to development cost." This addresses a bias in AI models trained on human data, which often overestimate development time. AI can code much faster, and this rule corrects the AI's tendency to favor seemingly "cheaper" but potentially lower-quality solutions.
- A directive for bug fixes: "Always start with reproducing the bug in an end-to-end setting as closely aligned with how an end-user would experience it as possible." This prioritizes end-to-end testing over unit tests, which are often insufficient.
Project-Level Memory File
Each project can have its own memory file (e.g., highbit/agents/memory.md). This file is more verbose and serves as the collective learning for agents working on that specific project. It includes:
- Project context and repository layout.
- Key terminology and component explanations.
- End-to-end testing procedures and conventions.
This file is built incrementally. When an agent makes a mistake, the correction is added to the memory file, allowing agents to learn and improve over time.
Using Skills
To manage the growing size of memory files and improve efficiency, conditionally useful information can be moved into skills. For example, end-to-end testing instructions are only relevant when an agent is making changes.
- Extracting to Skills: Agents can be prompted to extract specific sections from memory files into skills.
- Skill Management: Tools like
npc-skills(from Vercel) help manage skills. - Skill Creator: For harnesses that don't natively support skill creation, a "Skill Creator" skill can be installed to teach the agent how to generate new skills.
Skills use progressive disclosure: only a small description is loaded into the system prompt initially, and the full skill content is loaded only when the agent decides to use it. This prevents token bloat.
How Skills May Hurt Your Agent
Caution is advised when installing random skills from the internet. These can pose security risks (e.g., exposing API keys) and may degrade performance. A skill with 177,000 GitHub stars was evaluated and found to increase token usage by 5% while worsening results. Popularity does not equate to effectiveness; rigorous evaluation is crucial.
Voice Input
A significant productivity boost comes from adopting voice input. Talking is approximately three times faster than typing.
- Transcription: Voice input is transcribed locally using Open Super Whisper, a free and open-source application that runs the Whisper model on the user's machine, ensuring quality and privacy.
- Fallback to Typing: Typing is reserved for inputs like URLs or file paths, which are impractical to speak aloud.
The Importance of Agent Ergonomics
The design of external tools agents use significantly impacts their performance.
- GitHub CLI vs. API: Benchmarks show that using the GitHub CLI is three times cheaper in token cost and more than twice as fast as using the GitHub API for the same tasks.
- Axi Design Standards: The author developed Axi, a set of design standards for tools to optimize for agent ergonomics. Principles include using token-efficient output formats (saving ~40% tokens compared to JSON) and designing tools that treat agents as first-class citizens. Axi-based tools (like GitHub Axi and Chrome DevTools Axi) have demonstrated reduced turns and token usage for agents.
Planning with Interactive Artifacts
For complex planning, Lavish is a critical tool. It allows agents to generate interactive HTML artifacts that visualize options and prototypes, making discussions much more efficient than reading walls of text in the terminal.
- Lavish Editor: This tool instructs agents to create HTML visualizations consistent with the project's design system.
- Annotated Feedback: Users can annotate specific parts of the artifact to provide precise feedback to the agent.
- Decision Making: Interactive elements allow users to make decisions directly within Lavish, which are then sent back to the agent.
This interactive approach drastically improves the clarity and efficiency of the planning phase.
Validating Code Changes
The author's approach to code validation shifts from manual diff review to a more managerial role, leveraging automation.
-
The Bottleneck of Manual Review: AI can generate code rapidly, making manual review a significant bottleneck.
-
"No Mistakes" Pipeline: This free and open-source tool orchestrates a comprehensive validation pipeline:
- Creates a branch and commits changes in an isolated worktree.
- Analyzes the agent's intent and rebases changes onto the latest main branch.
- Performs an adversarial review in a fresh context window.
- Tests changes end-to-end, recording evidence (screenshots, videos, logs).
- Updates documentation and checks for linting issues.
- Pushes the branch and raises a Pull Request (PR).
- Monitors the PR for merge conflicts or CI failures.
-
Risk Assessment: The "No Mistakes" pipeline provides a risk assessment for each change. Low-risk changes are often merged without detailed diff review, as the pipeline is trusted to catch most issues. Higher-risk changes warrant more human scrutiny. This allows for scaling code changes significantly.
Long-Running Tasks
To keep agents busy during periods of inactivity (like sleep), the "Goodnight, Have Fun" tool is used.
- Objective-Based Execution: Users provide an objective, and the tool runs the agent in a loop until a stop condition is met.
- Example: An agent is tasked with acting as a seven-year-old child using an app, identifying and fixing usability problems.
- Monitoring: Progress can be monitored via token usage, iterations, and commits.
- Verifiable Objectives: This tool is well-suited for tasks with verifiable objectives, such as reducing page load time or improving test coverage.
While similar functionality exists in Cloud Code and Codex, "Goodnight, Have Fun" offers more precise control over token and iteration caps.
Parallel Worktrees and Agents
To manage multiple agents working concurrently without conflicts, Git worktrees are employed.
- Worktree Isolation: Each worktree is a separate clone of the repository, allowing agents to operate independently.
- Treehouse Tool: To manage the overhead of creating and tracking worktrees, Treehouse was developed. It simplifies the creation of fresh worktrees, tracks their status, and reuses idle worktrees to avoid unnecessary creation.
This enables running multiple agent sessions in parallel, each in its own isolated environment, significantly increasing parallel work capacity.
First Mate
As the number of parallel agent sessions grows, managing them becomes exhausting. The First Mate is introduced as a higher-level agent that orchestrates and manages the other agents.
- Centralized Management: The First Mate acts as a captain's assistant, delegating tasks to the crew of agents and handling their coordination.
- Workflow Integration: It integrates with tools like Treehouse and "No Mistakes" to manage worktrees, run agents, and validate changes.
- Customizable Transcription: Open Super Whisper can be customized with a system prompt to improve transcription accuracy for project-specific vocabulary.
- Task Delegation: The First Mate can handle multiple complex tasks concurrently, such as updating CLIs across several projects or analyzing open issues.
This layer of abstraction allows the user to focus on the strategic direction ("where should we go next?") rather than the operational details of managing individual agents.
The Captain's Mindset
With the First Mate handling operational overhead, the user's focus shifts to higher-level strategy. This requires a mindset shift from "sailor" to "captain":
- Understanding What Matters: Dedicate energy to understanding user needs, the competitive landscape, and defining a clear vision (the "treasure map").
- Strategic Direction: Crafting the overarching goals and direction for the crew of agents.
By mastering this workflow, users can transition from directly managing individual agents to becoming a strategic captain, leading a powerful crew to achieve ambitious goals. All tools mentioned are free and open-source, available on the author's GitHub.
Key Takeaways
- Terminal-Centric Workflow: Maximize productivity by keeping hands on the keyboard and maintaining a consistent environment.
- tmux for Parallelism: Effectively manage multiple agent sessions using tmux for splitting panes and creating tabs.
- Memory and Skills: Onboard agents with global and project-level memory files, and use skills to manage conditional knowledge efficiently.
- Voice Input: Leverage voice for faster and more fluid interaction with agents.
- Agent Ergonomics: Choose tools designed for efficiency with AI agents to save time and cost.
- Interactive Planning: Utilize tools like Lavish for clear and efficient planning with visual artifacts.
- Automated Validation: Employ pipelines like "No Mistakes" to automate code validation and scaling.
- Long-Running Tasks: Use tools like "Goodnight, Have Fun" to keep agents productive during idle periods.
- Parallel Worktrees: Manage multiple concurrent agent tasks without conflicts using Git worktrees and management tools like Treehouse.
- First Mate for Orchestration: Delegate agent management to a higher-level agent to focus on strategic direction.
- Captain's Mindset: Shift focus from operational details to strategic planning and vision.