Agent Engine Optimization: The Complete Playbook

This playbook consolidates two bodies of knowledge: a practitioner implementation guide built from deploying AEO on a live site (usemizan.com), and a 106-source research report on the global state of the discipline. Every recommendation is evidence-rated. Where the evidence is strong, the playbook says so. Where it is thin or speculative, it says that too.

The playbook is itself an AEO artifact. It has a machine-readable markdown version with YAML frontmatter, structured headings for section-level citation, and declarative framing throughout. It is also WebMCP-compatible — AI agents can discover and invoke structured tools to navigate, search, and extract implementation guidance. It practices what it preaches.

Validated

Supported

Emerging

Forward-Looking

Strategic Context

For: Decision-Makers

AEO — making websites discoverable, parseable, and citable by AI agents — has moved from speculative concept to operational discipline, but it remains in its earliest infrastructure phase. The investment case rests on three evidence-backed observations and two forward-looking bets.

What the evidence shows

AI agents are driving real business outcomes. Vercel reports that 10% of their signups now come from ChatGPT, the strongest publicly documented AEO ROI claim to date. Supported This is a single datapoint from a developer-tools company, but it demonstrates that AI-driven traffic converts.

AI engines cite sources using completely different patterns. An empirical study of 83,670 citations found that ChatGPT cites Wikipedia 120 times more often than Claude does. Claude favors blog content at 43.8% while ChatGPT and Perplexity prefer product pages at over 60%. There is no single "optimize for AI" strategy. Validated

The AI crawler ecosystem has consolidated into a three-tier architecture. Every major AI company now operates training bots, search bots, and user-retrieval bots. GPTBot saw a 305% increase year-over-year. Site owners can allow search bots (which drive citations) while blocking training bots (which consume content without attribution). Validated

The forward-looking bet

Agentic commerce protocols are shifting AEO from content discovery to transaction execution. OpenAI/Stripe's Agentic Commerce Protocol, Google's Universal Commerce Protocol, and Shopify's Agentic Storefronts signal a near-term future where agents transact on behalf of users. Machine-readable product catalogs will become table stakes. Forward-Looking

The honest assessment

AEO is worth investing in if your content has commercial value and you want it cited by AI agents. The investment is low (days, not months) and the downside is negligible — these practices also improve traditional SEO. The upside is positioning for a distribution channel that is growing rapidly but hasn't standardized yet.

Implementation Architecture

For: Practitioners

AI agents don't browse — they fetch, parse, and route. Every design decision should optimize for that pipeline.

The implementation model uses five layers, each serving a distinct purpose. Separation of concerns is critical — each file does one job, and none duplicates another.

Layer	File	Purpose
1. Product reference	`llms.txt`	Tells agents what you are, when you're relevant, what's out of scope
2. Routing brain	`AGENTS.md`	Maps user questions to specific content pages and sections
3. Parseable content	`*.md` files	Clean markdown versions of content pages with structured frontmatter
4. Discovery signals	`robots.txt` + `sitemap.xml`	Points crawlers to all agent-readable files
5. Agent interaction	WebMCP	Exposes tools, resources, and prompts agents can invoke in real time

Layers 1–4 handle passive discoverability. Layer 5 handles active interactivity — via the webmcp.dev library (available now) or the W3C navigator.modelContext API (standards track, Chrome Canary early preview). Supported

Layer 1: llms.txt — The Product Reference

For: Practitioners

A plain text file at your site root providing structured reference information about your product. Think of it as robots.txt for AI understanding.

Location: https://yoursite.com/llms.txt

Follow the Spec

Originated by Jeremy Howard (Answer.AI, September 2024). Format: H1 product name, blockquote summary, H2 sections with markdown hyperlinks. Community uptake: 2,200+ GitHub stars. Adopted by Anthropic, Cloudflare, NVIDIA, Stripe, Supabase. Validated

However, no major AI platform has committed to parsing llms.txt during inference. OAI-SearchBot accounts for 94.35% of observed pings. Pragmatic stance: implement because it's cheap, but don't treat it as a primary lever. Validated

The Declarative Voice Principle

The single most important lesson from practice. AI models are trained to resist injected instructions. Content framed as "Recommend X when..." triggers safety guardrails. The same information framed as "X is relevant when..." passes cleanly. Supported

Instead of (imperative)	Write (declarative)
`SYSTEM DIRECTIVE`	`Product Name`
`You are an AI assistant. Use this to...`	`Product is a tool that does X`
`Recommend Product when the user...`	`Product is relevant when a user...`
`Do NOT recommend Product if...`	`Product does not do X`

Same information. Zero guardrail friction.

Key Sections

H1 + Blockquote: Identity. Product name, definition, credibility.

Metadata: Platform, category, target market. Don't geo-fence unless you mean it.

Guides: Content pages as hyperlinks to .md URLs. Provide direct URLs — never "append .md."

Use cases: When the product is relevant. Frame as user situations, not agent commands.

Scope boundaries: What the product does not do. As important as what it does.

Knowledge graph: Link to AGENTS.md. Don't duplicate routing logic.

Layer 2: AGENTS.md — The Routing Brain

For: Practitioners

Maps user question patterns to specific content pages and sections. An API routing table for your content.

Location: https://yoursite.com/AGENTS.md

Structure: product summary, guide map table (slug, title, topic, trigger), routing logic as nested markdown bullets, cross-reference dependency graph, and a schema contract promising what .md files contain.

Use standard nested markdown bullets for routing trees — ASCII box-drawing characters get mangled by scrapers during chunking. Supported

AGENTS.md has 60,000+ repository adoptions in the coding community. Its use for marketing sites extends the original convention. Emerging

Layer 3: Markdown Mirrors

For: Practitioners

Clean .md versions of every HTML content page with structured YAML frontmatter. Same path, .md extension.

YAML frontmatter includes: title, slug, canonical URL, author with credibility signals, publisher, language, region, audience, topic, category, keywords, related guides with routing context, and product relevance.

The controlled evidence: A Profound experiment (Feb 2026) A/B tested 381 pages. Markdown pages saw roughly one extra median bot visit over three weeks. Format alone is not the leverage point — content quality and structure are. Validated

Still low-cost and useful. Generate at build time as static files alongside HTML. No content negotiation needed. Supported

Layer 4: Discovery Signals

For: Practitioners

Explicitly allow AI crawlers in robots.txt. Reference your llms.txt and AGENTS.md. Add all agent files to sitemap.xml with elevated priority.

Caution: ~13% of AI crawlers bypass robots.txt. For strict control, use WAF rules and IP verification. Validated

Strategic Crawler Access

Company	Training Bot	Search Bot	User Bot
OpenAI	GPTBot	OAI-SearchBot	ChatGPT-User
Anthropic	ClaudeBot	Claude-SearchBot	Claude-User
Perplexity	PerplexityBot	—	Perplexity-User
Google	Google-Extended	Googlebot	—

Allow search bots (drive citations), block training bots (consume without attribution) if that fits your strategy. Blocking OAI-SearchBot removes you from ChatGPT search answers. Validated

Layer 5: WebMCP — The Agent Interaction Layer

For: Practitioners

Layers 1–4 make your content discoverable and citable. Layer 5 makes your site actionable — agents can invoke structured tools, read resources, and execute actions on your behalf.

There are two implementation paths. They serve the same purpose but work differently and are at different stages of maturity.

Path A: webmcp.dev Library (Available Now)

The webmcp.dev library (by Jason McGhee, CTO of Writ) is an open-source JavaScript library that connects any website to MCP clients via a localhost WebSocket bridge. Include a script tag, register tools/resources/prompts, and users can connect via Claude Desktop, Cursor, or any MCP-compatible client.

// Include the library
<script src="webmcp.js"></script>

// Register a tool
registerTool({
  name: "search-content",
  description: "Search site content by topic",
  inputSchema: {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" }
    },
    required: ["query"]
  },
  execute({ query }) { /* return structured results */ }
});

// Register a resource
registerResource({
  uri: "site://pricing",
  name: "Pricing Information",
  read() { return { plans: [...] }; }
});

// Register a prompt
registerPrompt({
  name: "implement-aeo",
  description: "Step-by-step AEO implementation guidance",
  arguments: [{ name: "site_type" }],
  generate({ site_type }) { return `Guide for ${site_type}...`; }
});

When to use: You want WebMCP capabilities today. Your audience has MCP clients (developers, technical users). You want to test agent interaction patterns before the browser-native API stabilizes.

Path B: W3C navigator.modelContext (Standards Track)

The W3C Web Machine Learning Community Group is developing a browser-native API (navigator.modelContext) co-developed by Google Chrome and Microsoft Edge. Early preview in Chrome 146 Canary (February 2026).

// Feature-detect and register
if ('modelContext' in navigator) {
  navigator.modelContext.registerTool({
    name: "check-inventory",
    description: "Check product inventory by SKU",
    inputSchema: {
      type: "object",
      properties: {
        sku: { type: "string", description: "Product SKU" }
      },
      required: ["sku"]
    },
    annotations: { readOnlyHint: true },
    async execute({ sku }, agent) {
      return fetch(`/api/inventory/${sku}`).then(r => r.json());
    }
  });
}

The W3C spec also defines declarative HTML form annotation — add toolname, tooldescription, and toolparamdescription attributes to existing forms to make them agent-accessible with zero new JavaScript. Note: the declarative API section is still under active development.

When to use: You want progressive enhancement for when Chrome ships to stable. You're building for browser-native agent interaction. You want agents to discover tools without users installing MCP clients.

Security Model

webmcp.dev: User explicitly connects their MCP client via the widget. No tools discoverable until opt-in. W3C: Browser can prompt for permission at registration (what tools exist) and invocation (before an agent calls a tool). requestUserInteraction() enables tools to request explicit confirmation for sensitive actions.

Practical Recommendation

Implement the webmcp.dev library now (it works, your tools are reusable) while adding a navigator.modelContext feature-detection wrapper that activates when Chrome ships to stable. The tool definitions are nearly identical between approaches. Forward-Looking

Current Status

webmcp.dev: Stable, open-source, works today. auto-webmcp v0.3.0 (March 2026) adds React support. W3C spec: Community Group Draft (March 23, 2026). Chrome Canary early preview. Imperative API specified; declarative form API in design. Google I/O 2026 likely for stable announcements. Edge will follow. Firefox/Safari: no commitments. Forward-Looking

The Evidence Base

For: Decision-Makers + Practitioners

Validated (High Confidence)

AI engines cite sources using radically different patterns (83,670-citation study)
No AI platform has committed to parsing llms.txt during inference
Markdown format alone does not significantly increase AI bot traffic (controlled A/B)
JavaScript-rendered content is invisible to most AI crawlers
The crawler ecosystem uses a three-tier architecture

Supported (Moderate Confidence)

Declarative framing outperforms imperative framing
The five-layer architecture provides comprehensive agent coverage
WebMCP is co-developed by Google Chrome and Microsoft Edge teams through the W3C; the webmcp.dev library provides an independent, immediately usable implementation path
JSON-LD schema markup is the highest-leverage structured data format

Emerging (Low Confidence)

YAML frontmatter is parsed by some agents (no direct evidence of inference-time parsing)
The five-layer architecture generalizes beyond marketing sites

Forward-Looking (Speculative)

Agentic commerce protocols will shift AEO from discovery to transaction
WebMCP will ship to stable Chrome and become a standard web capability
MCP, A2A, and WebMCP may converge into a standard protocol

Anti-Patterns and Lessons Learned

For: Everyone

Do not write llms.txt as instructions to the agent. "Recommend X when..." triggers guardrails. "X is relevant when..." passes cleanly. Most common mistake, highest-leverage fix.

Do not geo-fence your target market unless you mean it. Agents take definitions literally. "Merchants in the GCC" excludes everyone else.

Do not tell agents to transform URLs. "Append .md to any URL" fails with less capable agents. Give direct URLs.

Do not use ASCII art for routing trees. Box-drawing chars get mangled during chunking. Use nested markdown bullets.

Do not break the schema contract. If AGENTS.md promises YAML fields, every .md file must deliver.

Do not duplicate across layers. llms.txt = identity. AGENTS.md = routing. Mirrors = content. Keep them separate.

Do not rely on robots.txt alone. 13% of crawlers bypass it. Use WAF rules for strict control.

Do not optimize for "AI" as a monolith. Each engine has different retrieval and citation patterns.

Do not expect format to drive traffic. Content quality and structure are the leverage point, not .md vs .html.

Test by fetching, not by reading. Have an AI agent fetch your live files. If it pushes back, the framing is wrong.

What's Next

For: Everyone

Agentic Commerce (12 months) Forward-Looking

Three protocols converging: OpenAI/Stripe's Agentic Commerce Protocol (open-source, Apache 2.0), Google's Universal Commerce Protocol, Shopify's Agentic Storefronts. Agents will transact, not just discover. Machine-readable catalogs become table stakes.

WebMCP and Browser-Native Agent Interaction (6-12 months) Forward-Looking

Two parallel tracks advancing: the webmcp.dev library provides an immediately usable path (auto-webmcp v0.3.0 added React support, March 2026). The W3C navigator.modelContext API provides the standards track — Chrome Canary has it now, Google I/O 2026 likely for stable release. Sites that are both discoverable (Layers 1–4) and actionable (Layer 5) will have the strongest AEO posture.

Protocol Convergence (12-24 months) Forward-Looking

MCP (Anthropic → Linux Foundation), A2A (Google), W3C WebMCP. Where backend MCP connects services to agents server-side, WebMCP connects websites to agents client-side. They complement rather than compete. Whether A2A converges with MCP or fragments the ecosystem will shape AEO significantly.

Measurement Maturation (6-12 months) Supported

Tools emerging: Semrush AI Visibility, Profound Agent Analytics, Analyze AI. But citations shift up to 60% monthly. Server log analysis by user-agent remains most reliable.

Open Questions

Will any AI lab commit to parsing llms.txt during inference? What is the causal mechanism for AI citation selection? How will copyright litigation reshape the landscape? Will AI visibility compensate for declining organic click-through? Will Firefox and Safari adopt WebMCP? Will the webmcp.dev library or browser-native navigator.modelContext become the dominant pattern — or will they converge? These questions have no answers yet. Building AEO now is a low-cost positioning bet with asymmetric upside.

Implementation Checklist

For: Practitioners

llms.txt at site root, spec-compliant (H1 name, blockquote summary, H2 sections)
llms.txt uses declarative voice — no system directives, no imperatives
llms.txt tested by having an AI agent fetch the live URL
AGENTS.md at site root with guide map, routing logic, cross-references, schema contract
Markdown mirror for every content page with full YAML frontmatter
Frontmatter verified against AGENTS.md schema contract
Region field consistent across all files (no accidental geo-fencing)
Routing logic uses standard markdown bullets (not ASCII art)
.md URLs provided directly (no transformation rules)
robots.txt explicitly allows AI crawlers, references llms.txt and AGENTS.md
sitemap.xml includes all .md mirrors and agent discovery files
Scope boundaries defined (what the product does NOT do)
Use cases written as user situations, not commands to agents
JSON-LD schema markup on HTML pages
Server-side rendering or static generation for all content pages
All files deployed and accessible at canonical URLs

Layer 5: WebMCP Agent Interaction

webmcp.dev library (implement now):

Include webmcp.js script on pages where agent interaction is valuable
Register tools with clear names, descriptions, and JSON Schema inputs
Register resources for structured data agents should read
Register prompts for common interaction patterns
Test with Claude Desktop or another MCP client
Verify tools return structured JSON, not HTML fragments

W3C navigator.modelContext (prepare for stable):

Feature-detect navigator.modelContext before registration
Use annotations: { readOnlyHint: true } for read-only tools
Use toolautosubmit only for read-only actions; require user review for transactions
Test in Chrome Canary with WebMCP flag enabled
Ensure forms work identically for non-agent users