What Is llms.txt? AI Search Visibility for Your CMS (2026)

The proposed AI-crawler standard, the fair skepticism around it, and why your CMS should generate it for free

Hamed Pakdaman Hamed Pakdaman
June 18, 2026 · 11 min read
What Is llms.txt? AI Search Visibility for Your CMS (2026)

Google's AI Overviews now appear on more than 58% of searches, and ChatGPT and Perplexity answer millions of questions a day by citing a handful of source pages. If an AI is going to summarize your site anyway, you want it reading your best pages — not whatever its crawler stumbled into.

That's the problem llms.txt tries to solve.

TL;DR: llms.txt is a proposed standard (from llmstxt.org) for a markdown file at your site root that tells AI systems what your site is about and which pages matter. It costs almost nothing to add, no major AI vendor has officially committed to reading it, and a good CMS can generate it automatically. We'll cover the spec, the format, the honest skepticism, and how we auto-generate it in UnfoldCMS — full disclosure: we build UnfoldCMS, and our own llms.txt is live at unfoldcms.com/llms.txt.


What Is llms.txt?

llms.txt is a plain markdown file served at /llms.txt on your domain. It gives large language models a curated, token-cheap map of your site: a title, a one-line description, and sections of links with short summaries. Think of it as a sitemap written for AI readers instead of search crawlers.

The spec was proposed in September 2024 by Jeremy Howard of Answer.AI. His pitch is simple: HTML pages are terrible input for LLMs. They're stuffed with nav bars, cookie banners, scripts, and footers. An LLM with a limited context window wastes most of it parsing junk. A markdown index at a known location lets an AI agent grab the signal and skip the noise.

The spec defines two files:

  1. /llms.txt — a short index. Title, description, and curated links with one-line summaries.
  2. /llms-full.txt — the heavyweight version. Full page content rendered as one big markdown document, so an AI can ingest everything in a single fetch.

Adoption on the publisher side has been real: Anthropic, Mintlify-hosted docs, Cursor, and thousands of documentation sites ship one. Adoption on the consumer side — whether the big AI crawlers actually fetch and use it — is much murkier. We'll get to that, because it's the part most posts on this topic dodge.


How Is llms.txt Different From robots.txt and sitemap.xml?

robots.txt tells crawlers what they can't access. sitemap.xml lists every URL that exists. llms.txt is the only one of the three that says what your content means — it's curation plus context, written in markdown that an LLM can read directly.

Here's the side-by-side:

robots.txt sitemap.xml llms.txt
Audience Search crawlers Search crawlers LLMs and AI agents
Format Plain text directives XML Markdown
Purpose Block or allow paths List all URLs for indexing Curate and describe key content
Carries meaning? No No (URLs + dates only) Yes — titles, descriptions, structure
Official adoption Universal, decades old Universal (Google, Bing) Proposed; no confirmed major-crawler commitment
Typical size A few lines Can be thousands of URLs A focused, human-readable page

One thing llms.txt is not: an access-control file. It can't block AI training crawlers — that's still robots.txt's job (User-agent: GPTBot and friends). llms.txt is an invitation, not a fence.

If you're already thinking about how crawlers and AI systems read structured content, this pairs well with our breakdown of how headless architecture changes SEO — the same "machine-readable beats pretty" logic applies.


The llms.txt Format, Explained

The format is deliberately minimal — it's just markdown with a few rules. One H1, one blockquote, then H2 sections full of described links. Any markdown parser can read it, and so can an LLM with zero special handling.

The spec's required and optional parts, in order:

  1. # Title — an H1 with your site or project name. The only required element.
  2. > Description — a blockquote with a one-or-two-sentence summary of the site.
  3. Free-form details — optional paragraphs or lists with context the AI should know.
  4. ## Section headings — groups of links, each formatted as - [Title](url): description.
  5. ## Optional — a magic section name. Links here are "skip these if context is tight." It's the spec's way of marking secondary content.

A real-world shape looks like this:

# UnfoldCMS

> A Laravel-based CMS for developers who want full code
> ownership — self-hosted, API-first, no vendor lock-in.

## Documentation

- [Installation](https://unfoldcms.com/docs/installation): Server
  requirements and a 10-minute setup guide
- [REST API v1](https://unfoldcms.com/docs/api): Public read
  endpoints plus Sanctum-authenticated write access

## Blog

- [Headless CMS and SEO](https://unfoldcms.com/blog/headless-cms-and-seo):
  How rendering strategy affects crawlability and rankings

## Optional

- [Changelog](https://unfoldcms.com/changelog): Version history

That's the whole spec. No XML schema, no validation service, no registration. You write markdown, you serve it at the root as text/plain or text/markdown, you're done.


llms.txt vs llms-full.txt: Which Do You Need?

Start with llms.txt — it's the index and the part of the spec people actually mean. Add llms-full.txt when your content is documentation-like and you want AI tools to ingest entire pages in one request instead of crawling them one by one.

The split matters because they serve different consumption patterns:

  • llms.txt is for navigation. An agent fetches it, sees ten described links, and follows the two that match the user's question. Cheap on tokens, fresh on every follow-up fetch.
  • llms-full.txt is for bulk ingestion. It inlines the full markdown body of every included page. Developers paste it into a chat context, or RAG pipelines ingest it as a single document. Anthropic's own docs ship one, and it's huge.

The trade-off is honest: llms-full.txt can blow past an LLM's context window on a large site, and it duplicates content you already serve as HTML. If your site is a blog with 200 posts, a full dump is probably noise. If it's product docs with 40 tight pages, it's genuinely useful.


Why Does llms.txt Matter for AI Search Visibility?

Because the way people find content is splitting in two. Classic SEO gets you ranked in a list of blue links. GEO (generative engine optimization) and AEO (answer engine optimization) get you cited inside the answer — and answers are where attention is moving fast.

The numbers back the shift:

  • AI Overviews show on 58%+ of Google queries in recent studies, up from roughly 25% a year earlier.
  • When an AI Overview appears, click-through to organic results drops sharply — users take the synthesized answer.
  • ChatGPT, Perplexity, and Claude all cite sources now. Being one of the three cited links in an answer is the new position one.

llms.txt is a bet on that second channel. The reasoning: AI systems work under hard token budgets, so anything that hands them clean, pre-structured, well-described content lowers the cost of reading you — and content that's cheaper to read is more likely to get read, retrieved, and cited.

Notice the hedge in that sentence: more likely, not guaranteed. Which brings us to the part of this topic most write-ups skip.


Does Anything Actually Read llms.txt? The Honest Answer

Right now, there's no public confirmation that Google, OpenAI, or Anthropic use llms.txt for search or answer generation. Google's John Mueller has compared it to the old keywords meta tag. Treat it as a low-cost bet on an emerging convention, not a ranking tactic with proven returns.

The skeptic's case is fair and worth stating plainly:

  • No official adoption. None of the major AI vendors document llms.txt support in their crawlers. Server-log studies in 2025 found GPTBot and ClaudeBot rarely requesting the file on most sites.
  • Mueller's jab stings because the keywords meta tag died for a real reason: self-declared importance invites spam, so engines learned to ignore it.
  • No feedback loop. You can't measure "citations gained from llms.txt" the way you can measure clicks from a sitemap. You're flying blind.

The optimist's case is also real:

  • The cost is nearly zero if your CMS generates the file automatically. A bet that costs nothing needs almost no payoff to be worth placing.
  • Developer-facing AI tools already use it. Coding assistants and doc-aware agents fetch llms.txt from documentation sites today — that's adoption, just not at the Google scale.
  • robots.txt and sitemaps started exactly this way: informal conventions that vendors adopted after publishers made them common.

Our stance: generate it automatically, never maintain it by hand. Hand-curating a file with unproven ROI is a waste of your week. Auto-generating one is a rounding error — and if the convention wins, you were early.


How a CMS Can Auto-Generate llms.txt

The right place for llms.txt is your CMS, not a cron job or a manually edited file. Your CMS already knows every published URL, title, and description — generating the file is just rendering that knowledge in markdown and keeping it fresh on its own.

Here's how we built it in UnfoldCMS (disclosure again: this is our product, and the output is live at unfoldcms.com/llms.txt):

  1. Config-driven selection rules. A config/llms.php file declares which content types and categories are included, a minimum word count (thin posts don't earn a slot), and a cache TTL. Change the rules, not the file.
  2. Per-post flag overrides. Three flags adjust individual posts: llms-include forces a post in even when the rules wouldn't pick it, llms-exclude forces it out, and llms-optional moves it into the spec's Optional section. Flags are set programmatically — there's no admin toggle for this yet.
  3. Spec-compliant rendering. A builder service outputs the # title, > description, ## sections, and - [Title](url): description link lines exactly per the llmstxt.org spec, served as text/plain; charset=utf-8.
  4. llms-full.txt for free. The same builder renders the full markdown body of every selected post into /llms-full.txt — no second pipeline to maintain.
  5. Caching with a known TTL. Output is cached and rebuilt on a configurable interval, so a thousand AI fetches don't touch your database a thousand times.

The same content selection logic feeds our public REST API — one source of truth about what's published, three machine-readable surfaces (HTML, JSON, llms.txt). Setup details live in the docs.

Want this without building it? See what ships in UnfoldCMS or poke at the live demo — llms.txt generation is on by default.


Should You Add llms.txt to Your Site?

Yes, if it costs you under an hour or your CMS does it for you. No, if you'd be hand-maintaining a markdown file forever on the hope that crawlers show up. The deciding question isn't "does llms.txt work?" — it's "what does it cost me?"

This is also a useful lens for picking a CMS. A platform that auto-generates llms.txt, sitemaps, and structured data is showing you how it handles every emerging standard: as configuration, not as a plugin you bolt on later. That's one of the signals we recommend checking in how to evaluate a CMS beyond the marketing page.

One warning either way: don't let llms.txt distract from fundamentals. Clean HTML, fast pages, real structured data, and content worth citing move the needle today. llms.txt is a cheap side bet on top of that stack — never a substitute for it.


FAQ

Does llms.txt help with Google rankings?

No. Google has not announced any use of llms.txt, and John Mueller has publicly downplayed it. It targets AI answer engines and agents, not the classic ranking pipeline. Your sitemap.xml and robots.txt still do that job.

Where does the file go, exactly?

At your domain root: https://yourdomain.com/llms.txt (and /llms-full.txt if you serve it). It should return markdown with a text/plain or text/markdown content type — not HTML.

Can llms.txt stop AI companies from training on my content?

No. It's a discovery aid, not an access control. To block training crawlers, use robots.txt rules for agents like GPTBot, ClaudeBot, and Google-Extended.

Do I need llms-full.txt too?

Only if you want AI tools to ingest whole pages in one fetch — most useful for documentation sites. For a typical blog, a well-curated llms.txt index is enough, and the full dump may exceed model context windows anyway.


Sources and Methodology

  • Spec: llmstxt.org — the llms.txt proposal by Jeremy Howard (Answer.AI), September 2024. Format rules in this post follow it directly.
  • AI Overviews prevalence: 2025–2026 SERP studies (Semrush and similar trackers) reporting AI Overviews on 58%+ of US Google queries; the figure varies by query category and month.
  • Skepticism: John Mueller's public comments comparing llms.txt to the keywords meta tag, plus 2025 server-log analyses showing low crawler pickup of the file.
  • UnfoldCMS implementation details: taken from our shipped code — config rules, flag overrides, builder service, and Pest test suite — and verifiable live at unfoldcms.com/llms.txt.

We build UnfoldCMS, so weigh the product sections accordingly. The spec coverage and the skepticism stand on their own.

Free & Open Source

Own your CMS. No subscriptions.

Unfold CMS is free to download and self-host. Built on Laravel + React, full source code included.

Share this post:

Discussion

Comments (0)

Leave a Comment

Please log in to leave a comment.

Don't have an account? Register here

No comments yet. Be the first to share your thoughts!

Keep Reading

Related Posts

Back to all posts