Sub-grade spoke

llms.txt Quality — does your site map itself for AI agents?

Q: What goes in llms.txt vs. robots.txt vs. sitemap.xml?

[robots.txt](/learn/visibility/bot-access-policy) is policy — which UAs may fetch what. [sitemap.xml](/learn/visibility/indexation-coverage) is the URL inventory — what pages exist. llms.txt is the *map for LLMs* — a prose-and-link Markdown file that says *here's what this site is and where the important stuff is*. Complementary, not redundant.

Q: What's the canonical structure?

H1 site name → summary paragraph → `## Docs` section with a link list → `## Pricing` section → `## Blog` section → optional sections. See llmstxt.org for reference implementations. The shape should match how an agent would summarize your site in one paragraph plus three lists.

llms.txt is the emerging Markdown convention for telling AI agents what your site is and where the important content lives — like robots.txt plus sitemap.xml plus a short bio, readable by LLMs that prefer structured prose to crawling everything. Adoption is small but concentrated among technical, content-rich sites (Anthropic, Stripe, Linear, Vercel). Writing one costs an hour; the upside is being legibly mapped for the agents that prefer structured discovery to brute-force crawl.

By Chris Mühlnickel · 2026-05-16

What is llms.txt Quality?

llms.txt Quality is whether your site has an llms.txt file at the root with a clear site description, key URLs, and contextual notes that AI agents can use to navigate without exhaustive crawling.

By the numbers

10.13% — of 300,000 domains studied ship an llms.txt file — adoption highest among mid-traffic sites. (SE Ranking — LLMs.txt: Why Brands Rely On It)
7× — growth in llms.txt-publishing sites from 15 in Feb 2025 to 105 by May — Majestic Million crawl. (Search Engine Land — Does llms.txt matter?)
3.7M — tokens in Cloudflare's llms-full.txt — the largest published file from a major SaaS adopter. (Mintlify blog — Real llms.txt examples)

Why it matters

AI agents have a routing problem. Given an ambiguous query, they need to decide which sites to fetch, which pages within those sites to read, and how to interpret what they find. Crawling everything is expensive; reading a 50-line description that maps the site is cheap. llms.txt is the canonical version of that 50-line description — a Markdown file at site root that tells an agent what this site is and where the important pages live, in a form designed for LLM context windows rather than human browsing. SE Ranking's 300K-domain sample puts current adoption at 10.13% — small but concentrated where it matters: API-first SaaS, developer tools, and content-rich publications.

The format is simple but the value is asymmetric. llms.txt is plain Markdown: title, summary paragraph, link list with descriptions. An hour to write. The cost to your site is near-zero; the value to an agent trying to summarize you is significant. Cloudflare ships a 3.7M-token llms-full.txt — the longest of the major adopters — because the doc surface is large enough to justify a full content dump for context-hungry agents. Most sites need nothing close to that scale; even a short llms.txt outperforms no llms.txt by a wide margin on the agents that consume it.

Adoption is concentrated, not universal — and that's the opportunity. Most sites still don't have llms.txt. The ones that do are API-first SaaS (Stripe, Anthropic, OpenAI, Linear, Notion), developer tools (Vercel, Cloudflare), and a handful of content-rich publications. Being in the early-adopter cohort matters more than where you rank within it. The 7× growth between February and May 2025 says the inflection point is close — sites that ship now get cohort-level visibility before the convention is universal.

The major LLM platforms read it. Anthropic's Claude, OpenAI's ChatGPT browsing, Perplexity, and several IDE-integrated agents (Cursor, Continue) all read llms.txt at site root when they fetch your site. The asymmetry is the point: zero downside to publishing one, measurable upside as more platforms wire in discovery against it. By 2027 the absence of llms.txt is likely to harm agent-citation rates meaningfully — at which point Spekto reclassifies it as a Power parameter.

Where it's heading

llms.txt becomes a standard convention. Currently community-driven via llmstxt.org and individual vendor adoption. By 2026-2027, expect either W3C / IETF formalization or de facto standardization through wider adoption. The shape probably doesn't change much — the format hit its constraints early and the spec is stable.

Agent-specific extensions to llms.txt. Discussions in flight about embedding capability descriptors — this site offers an MCP server at X, has an OpenAPI spec at Y — directly in llms.txt. Would make discovery faster for agents looking to take action, not just summarize. The crossover with Model Context Protocol and A2A is where the next layer of value lives.

AGENTS.md, llms-full.txt, and llms.txt converge. Three parallel conventions today — one for coding agents, one for long-form content extraction, one for site navigation. Expect convergence pressure as agent vendors prefer one well-known file over three, and as the conventions overlap in what they describe. Sites publishing all three today are over-investing in optionality; sites publishing none are under-investing.

Common mistakes

Putting llms.txt anywhere other than site root. It must be at /llms.txt, not /docs/llms.txt or /about/llms.txt. Agents fetch from root by convention; a non-root file is invisible to the convention.
Writing it once and never updating. Stale llms.txt with dead links signals this site doesn't maintain its agent interface — treat it like the sitemap, not the about page.
Filling it with marketing copy. Agents want structural information, not value propositions. We're the leading platform for X is unhelpful; our docs index is at /docs and covers SDK installation, authentication, and webhooks is useful.
Skipping llms-full.txt for content-heavy sites. The short llms.txt is for navigation; the long llms-full.txt is for actual content extraction. Doc-heavy sites benefit from both.
Not validating that linked URLs resolve. Broken links in llms.txt undermine the whole point — the file is the agent's first impression, and it's the easiest one to leave silently broken.

Frequently asked

Is llms.txt actually used by anyone?

Yes, and adoption is growing. Anthropic's Claude, OpenAI's ChatGPT browsing, Perplexity, and several IDE-integrated agents (Cursor, Continue) read llms.txt at site root. The cost of writing one is low; the upside is being legibly mapped for agents that prefer structured discovery. Adoption grew 7× between February and May 2025 across the Majestic Million sample — the trend line points up.

What goes in llms.txt vs. robots.txt vs. sitemap.xml?

robots.txt is policy — which UAs may fetch what. sitemap.xml is the URL inventory — what pages exist. llms.txt is the map for LLMs — a prose-and-link Markdown file that says here's what this site is and where the important stuff is. Complementary, not redundant.

Do I need [llms-full.txt](/learn/glossary#term-llms-full-txt) too?

Optional. If you have substantial documentation or articles agents would want to read in full rather than browse to, yes — Cloudflare's 3.7M-token llms-full.txt is the canonical example of how big this can get for a doc-heavy site. Otherwise the short llms.txt is enough.

What's the canonical structure?

H1 site name → summary paragraph → ## Docs section with a link list → ## Pricing section → ## Blog section → optional sections. See llmstxt.org for reference implementations. The shape should match how an agent would summarize your site in one paragraph plus three lists.

Do agents prioritize sites with llms.txt over sites without?

Some do (Perplexity, Anthropic) — the file gives them a fast structural fingerprint that's cheaper than crawling. Most don't yet at the citation-routing layer, but the trend is in that direction, and adopting now is cheap insurance against the convention crossing the expected threshold.

How often should I update llms.txt?

When site structure changes significantly — new product surface, new docs section, major rebrand. Daily blog posts don't require updates; the file is structural, not content-syncing. Treat the cadence like a sitemap refresh rather than a content publish.

Should I also ship [AGENTS.md](/learn/glossary#term-agents-md)?

AGENTS.md is the parallel convention coming out of the coding-agent ecosystem (Cursor, Sourcegraph) — same shape, different consumer profile. If you ship developer tooling or an open-source project, both are worth having. For content sites, llms.txt alone is enough today.