Key Takeaways
- llms.txt is a plain-text Markdown file placed at yourdomain.com/llms.txt that gives AI retrievers a structured, curated summary of your brand and key content — specifically for LLM consumption.
- Unlike robots.txt (which controls crawl access) or sitemap.xml (which lists pages), llms.txt tells AI systems what your site is ABOUT and which pages are most authoritative for which topics.
- Perplexity's retriever and Claude's web search already actively read llms.txt files. ChatGPT Browse and Google are expected to increase llms.txt weighting in 2025.
- The most common llms.txt mistake is writing it for humans instead of models — it should contain terse, entity-dense definitions, not marketing copy.
- An llms.txt file can be created in under 30 minutes and is one of the fastest, most direct GEO improvements available for any site.
llms.txt is one of the simplest files you can add to a website, and one of the most direct signals you can send to AI systems that are deciding whether to cite you. This guide covers the full specification, a complete worked example, how different AI systems currently use the file, and every common mistake to avoid — so you can ship a correct implementation in a single sitting.
What llms.txt Is and Why It Exists
When a large language model answers a question about a topic you cover, it makes decisions about which sources to cite. Those decisions are shaped by how clearly and structurally the content on your site communicates its authority and relevance to a machine reader. Traditional SEO signals — title tags, H1 hierarchies, schema markup — were designed for search engine crawlers and ranking algorithms. They were not designed for the retrieval-augmented generation (RAG) pipelines and real-time web fetchers that power AI answer engines.
The llms.txt format was proposed by fast.ai founder Jeremy Howard in 2024 as a community-driven convention, deliberately analogous in spirit to robots.txt but with a completely different purpose. Where robots.txt says "here is what you are allowed to crawl," llms.txt says "here is what my site is about and here are the most important pages to understand it." It is a curated index for AI retrievers, authored by the site owner, written in Markdown, and placed at the root of your domain.
The underlying problem llms.txt solves is real: AI systems that do web retrieval cannot spend indefinite time crawling every page on your site to build a mental model of what you do. They fetch a small number of pages per query. If your most authoritative content is buried in paginated archives or locked behind dynamic routes, the retriever may never reach it. llms.txt gives you a direct channel to surface that content — on your terms.
The adoption curve has accelerated. Companies like Anthropic, Cloudflare, Stripe, and Vercel have already published llms.txt files. Thousands of SaaS brands and content publishers followed in 2024 and 2025. Using a tool like Bingly to track which of your pages are actually being cited by AI systems will reveal quickly whether your existing site structure is sufficient — or whether a well-crafted llms.txt would change the outcome.
The llms.txt Specification: Full Format Reference
The llms.txt format is intentionally minimal. It is a UTF-8 encoded plain-text file written in Markdown. The file lives at https://yourdomain.com/llms.txt (the root path, no subdirectory). There is no content-type requirement beyond plain text, but text/plain or text/markdown are both acceptable.
Required fields
The specification defines a small set of conventionally expected sections. None of them are enforced by a validator, but omitting them reduces the signal quality for retrievers:
Structural sections
- H1 heading — the canonical name of the site or brand.
- > blockquote immediately after the H1 — a one-to-three sentence terse description of what the site does. Written for a model, not a human reader.
- ## sections — topical groupings of links. Each group has a heading and a list of Markdown links with short descriptive text.
- Optional: ## Optional section — lower-priority pages, legal docs, or supplemental content that retrievers can skip under token constraints.
The full syntax at a glance
Each link entry in a section follows the pattern: - [Page Title](https://example.com/page): Short terse description of what this page covers. The description after the colon is the most important part of each entry. It is what the model reads when deciding whether to fetch that URL. Keep it factual, entity-dense, and under 25 words.
The companion file /llms-full.txt is an extended convention for sites that want to expose full document bodies. It follows the same structure but includes the complete text of each page inline, separated by horizontal rules. Some AI systems will prefer the full version if token budgets allow.
A Complete Worked Example: llms.txt for a SaaS Brand
Below is a fully annotated example for a fictional SaaS product called "Vaultbase," a database-as-a-service platform. Each annotation explains the authoring decision.
# Vaultbase > Vaultbase is a serverless PostgreSQL database-as-a-service platform. > It provides branching, instant restore, connection pooling, and a > REST API. Targets Node.js, Python, and Go developers building > production web applications. ## Documentation - [Quickstart Guide](https://vaultbase.io/docs/quickstart): Five-minute setup guide. Creates a database, runs first SQL query via REST API. - [Branching Overview](https://vaultbase.io/docs/branching): Explains database branching: how branches are created, merged, and deleted. Key differentiator vs. traditional Postgres hosting. - [Connection Pooling](https://vaultbase.io/docs/pooling): PgBouncer-based pooler included by default. Config options and connection string format. - [REST API Reference](https://vaultbase.io/docs/api): Full OpenAPI spec for the /query, /schema, and /branch endpoints. ## Pricing and Plans - [Pricing Page](https://vaultbase.io/pricing): Free tier: 1 database, 512 MB storage. Pro: $25/month, 10 databases, 10 GB. Enterprise: custom. - [Usage-Based Billing FAQ](https://vaultbase.io/docs/billing): Explains compute credits, storage billing per GB-hour, and egress costs. ## Comparisons - [Vaultbase vs. PlanetScale](https://vaultbase.io/compare/planetscale): Feature and pricing comparison. Vaultbase supports full SQL; PlanetScale restricts foreign keys. - [Vaultbase vs. Supabase](https://vaultbase.io/compare/supabase): Focuses on branching and latency differences for read-heavy workloads. ## Optional - [Changelog](https://vaultbase.io/changelog): Release history. - [Status Page](https://status.vaultbase.io): Uptime and incident history. - [Terms of Service](https://vaultbase.io/legal/tos): Legal agreement.
Notice what is absent: no mission statement prose, no customer logos, no testimonials, no "learn more" filler links. The blockquote immediately after the H1 is the only place where you get to describe the brand in sentence form — and even that should read like a structured entity definition, not a tagline. Every other entry is a URL plus a terse factual description.
The "Optional" section is specifically for content that is useful context but not worth consuming under a tight token budget. Retrievers that implement the spec correctly will skip the Optional section when fetching under constraints. This is where you put legal pages, changelogs, and supplementary reference material.
How Different AI Systems Use llms.txt
The llms.txt file is a convention, not a protocol enforced by a standards body. Each AI system that reads it does so according to its own retrieval logic. Understanding the differences matters when deciding how much effort to invest and where to focus.
Perplexity
Perplexity's real-time web retriever is currently the most aggressive consumer of llms.txt. It fetches the file as part of domain-level context gathering when a query resolves to a site. The entries in the file directly influence which pages Perplexity fetches to build its cited answer. Sites with well-structured llms.txt files consistently show higher citation rates in Perplexity than equivalent sites without them, particularly for long-tail technical queries.
Claude (Anthropic)
Claude's web search feature, available through Claude.ai and the API with tool use, fetches llms.txt when it determines a domain is authoritative for a query. Anthropic has acknowledged the format in their developer documentation. Given that Anthropic's own domain publishes an llms.txt file, the signal to the community is unambiguous.
ChatGPT Browse
OpenAI's Browse feature has not officially confirmed llms.txt support as of mid-2025, but there is mounting evidence from retrieval experiments — including data collected by Bingly users — that ChatGPT Browse does fetch the file on certain domain-level queries. The expected increase in weighting later in 2025 is consistent with OpenAI's stated direction on structured web signals.
Google AI Overviews
Google has not yet formally integrated llms.txt into its crawl pipeline, but has indicated interest in structured AI-readable formats. Google's existing infrastructure already processes structured data at scale, making llms.txt a natural candidate for future ingestion. Publishing the file now costs nothing and positions you for any eventual signal weighting Google applies.
What to Include (and What to Leave Out)
The authoring question most teams get wrong is scope. There is a strong temptation to include every page on the site — especially after building a sitemap mindset from years of traditional SEO. The llms.txt specification is not a sitemap. Its value comes from curation, not completeness.
Common authoring mistakes
- Listing 200+ pages. Retrievers have token budgets. A bloated file dilutes the signal from your most important pages.
- Using marketing language in descriptions. "The best-in-class solution for modern teams" gives a model zero useful information. "Serverless Postgres with branching, instant restore, and PgBouncer pooling" is what a model needs.
- Describing what the page does for the user rather than what it contains. "Helps you get started quickly" is noise. "Five-minute setup guide; creates a database and runs first SQL query" is signal.
- Omitting competitors and comparisons. Comparison pages are high-intent, high-authority content. AI systems actively look for comparison information when answering "X vs Y" queries.
- Forgetting to update the file when major pages change. A stale llms.txt pointing to outdated content or 404 URLs actively degrades your standing with retrievers.
The right inclusion criteria
Apply a single test to every page you consider adding: "If an AI model fetched only this page, would it have enough information to answer a meaningful question about my brand, product, or topic?" If yes, include it. If the page exists for conversion optimization, brand reinforcement, or navigational UX rather than information density, leave it out or move it to the Optional section.
Ideal candidates for your main sections include: documentation pages, product feature pages with technical specifics, pricing pages with concrete numbers, comparison pages, use-case pages with clear audience and context, and any long-form content that defines the concepts your product is built around.
How to Validate and Test Your llms.txt
Once you publish the file, basic validation involves three checks: the URL resolves with a 200 status, the file is served as plain text, and every URL in the file returns a 200. You can run the URL check with a simple script:
# Quick bash audit — checks all URLs in your llms.txt
curl -s https://yourdomain.com/llms.txt \
| grep -oP 'https?://[^)]+' \
| xargs -I{} curl -o /dev/null -s -w "%{http_code} {}\n" {}Beyond mechanical validation, the more important test is behavioral: does adding the file change how AI systems cite your brand? This is exactly what Bingly measures — tracking citation presence, citation rank, and which pages AI systems surface across Perplexity, Claude, and ChatGPT for your target keywords. Running a baseline scan before publishing your llms.txt and comparing it to a scan two to four weeks after publishing gives you a clean before/after signal.
Ongoing maintenance expectations
llms.txt is not a one-time task. Build a process to review it whenever you publish a major new feature page, restructure your documentation, or retire a product area. A quarterly audit is sufficient for most sites. Treating it like any other structured data asset — kept accurate, kept current — is what sustains the signal quality over time.
llms.txt vs. robots.txt vs. sitemap.xml: What Each Does for AI Visibility
These three files serve entirely different functions. Conflating them — or assuming that a well-maintained sitemap.xml makes llms.txt redundant — is a common mistake among teams transitioning from traditional SEO thinking to generative engine optimization.
| File | Primary purpose | What it communicates | AI relevance |
|---|---|---|---|
| robots.txt | Crawl access control | Which URLs crawlers are allowed or disallowed to fetch | AI web retrievers generally respect Disallow directives, but robots.txt says nothing about topical relevance or authority |
| sitemap.xml | URL inventory for crawlers | A complete or near-complete list of indexable URLs, with optional lastmod and priority hints | Lists every page but provides no semantic context; AI retrievers cannot prioritize from it without fetching each page |
| llms.txt | Curated semantic index for AI | What the site is about, which pages are most authoritative for which topics, and how the brand should be understood by a model | Directly consumed by AI retrievers; shapes which pages get fetched and cited in AI answers |
| llms-full.txt | Full-text AI corpus | Complete page content for every listed document, inlined in a single file | Allows AI systems to read all content without individual page fetches; best for documentation-heavy sites |
The three files are complementary, not competing. A production site should maintain all three. robots.txt handles access policy. sitemap.xml handles crawl discoverability for search engines. llms.txt handles semantic prioritization for AI retrievers. Each does something the others cannot.
One additional distinction: robots.txt and sitemap.xml have formal specifications maintained by standards bodies (the Robots Exclusion Protocol and the Sitemaps Protocol, respectively). llms.txt is a community convention. That means there is no authoritative validator, no formal schema, and no compliance requirement. The convention succeeds because AI system developers have chosen to adopt it — and as adoption expands, the value of implementing it correctly only increases.
Frequently Asked Questions
Does llms.txt affect traditional search engine rankings?
No. Google, Bing, and other traditional search crawlers do not currently factor llms.txt into organic ranking algorithms. The file is specifically targeted at AI retrieval systems. Implementing it will not help or hurt your traditional SEO rankings — it is purely additive.
How long should my llms.txt file be?
There is no formal limit, but a practical guideline is 50 to 100 link entries across all sections, with the Optional section accounting for any overflow. If you find yourself adding more than 100 links, you are almost certainly including pages that would not meaningfully affect AI citations. Prioritize quality over completeness.
Can llms.txt replace structured data and schema markup?
No. llms.txt and schema markup address different layers of machine readability. Schema markup (JSON-LD, microdata) annotates individual pages with structured entity data that search engines and AI systems both consume. llms.txt provides a site-level semantic index. You need both. Neither makes the other redundant.
What is the difference between llms.txt and llms-full.txt?
llms.txt contains a curated list of links with short descriptions — it tells AI systems which pages exist and what they cover, without including full content. llms-full.txt includes the complete body text of each listed page, inline, in a single file. llms-full.txt is ideal for documentation sites where completeness matters more than retrieval efficiency.
How do I know if AI systems are actually reading my llms.txt?
Direct server-log analysis can show fetches from AI crawler user agents (e.g., PerplexityBot, ClaudeBot, OAI-SearchBot). Behavioral measurement — tracking whether your citation rate in AI answers increases after publishing — provides stronger evidence. Platforms built specifically for GEO tracking, such as Bingly, automate this measurement across multiple AI systems and keywords.