Controlling AI Understanding: Schema, llms.txt, and What Actually Works

In SEO circles, you keep hearing it: "Add schema markup to your site and ChatGPT will love you!" Sounds logical — schema helps Google, so it must help AI too. Right?

Then there's llms.txt — a plain text file that's supposed to tell AI models what matters on your site. Over 844,000 websites have implemented it. Google publicly dismissed it, then quietly published their own.

Both approaches promise you can control how AI understands your brand. Reality is more complicated.

Schema Markup: Why AI Ignores Your Code

Michael Curtis put it well: "The entire point of the LLM approach is to build knowledge from freeform text — not from structured data." LLMs don't need structure. They're built to understand chaos.

How does AI learn? It chops text into tiny fragments — tokens. When you write clean schema markup:

"@type": "Organization"

...the AI doesn't see structure. It sees fragments: "@", "type", "Organization". The semantic logic you carefully built evaporates during tokenization.

The Proof

Researchers built two product pages: one with normal text plus schema, one with only schema and otherwise blank. Across hundreds of queries, Gemini and ChatGPT could extract prices and SKUs only from the text-based page. The schema-only page was invisible.

So Delete Schema?

Absolutely not. Schema is still essential for traditional Google search — rich snippets, price displays, Knowledge Graph. And ranking well on Google correlates with higher chances of being cited by AI. But that's correlation, not causation.

The belief that schema markup directly drives AI mentions is a persistent myth — kept alive by repetition in SEO culture, not by evidence.

llms.txt: The Signpost Nobody Reads

The idea is compelling. Imagine you have a janitor (the AI) who needs to organize your house (your website). Until now, you let them in and hoped they'd find the living room instead of getting stuck in the basement. robots.txt was just the "No Entry" sign on certain doors.

llms.txt is a friendly note at the entrance: "Hey, the important stuff is on the table in the living room. Ignore the clutter in the hallway."

Jeremy Howard from Answer.AI proposed this in September 2024. The file defines:

  • /llms.txt — curated overview in Markdown with key links
  • /llms-full.txt — complete content in a single file

Sounds simple, takes 10 minutes. Over 844,000 websites have implemented it. Anthropic, Cloudflare, Stripe, Vercel, Perplexity, Hugging Face — they all have one.

The Data Says: No Effect

Then came the reality check. SE Ranking analyzed 300,000 domains. The result: statistically no measurable effect of llms.txt on AI citations. Removing the llms.txt feature actually improved their model's accuracy — the file introduced noise.[^1]

Google spokesperson John Mueller compared llms.txt to the keywords meta tag in June 2025 — which was abandoned due to manipulability. His argument: bots need to verify original content anyway. llms.txt is redundant and opens cloaking risks.[^2]

And then the irony: Google published their own llms.txt on the Search Central Developer portal in December 2025.[^3] Publicly against it, quietly adopting it.

Strategic Silence

Not a single major AI provider has officially confirmed using llms.txt. Anthropic publishes their own but hasn't said Claude references it during conversations. OpenAI is silent. GPTBot doesn't fetch llms.txt files according to analyses.

Aspect robots.txt sitemap.xml llms.txt
Purpose Access control URL discovery Content curation for LLMs
Format Plaintext XML Markdown
Authority De facto standard (RFC 9309) De facto standard Proposal, no standard
Adoption ~universal ~universal ~10%
Proven effect Yes Yes No

What Actually Works

If neither schema markup nor llms.txt have a direct effect on AI citations — what does?

The answer is frustratingly simple: good text.

A visible table on your website gives AI more than 50 lines of JSON-LD in your source code. A clear heading structure beats any llms.txt file. Because LLMs learn from what humans can read — and that's what needs to be good.

Three things matter:

  1. Clear, parseable content. Tables, lists, fact sections. AI can't extract facts from prose with 100% certainty — and leaves uncertain information out. (More on this in the GEO Playbook)
  2. Semantic HTML. Tags like <article>, <section>, <header> aren't decoration. They're the map for crawlers and AI. Div soup increases token costs and reduces crawl frequency.
  3. External validation. AI doesn't just read your site. It reads Reddit, podcasts, trade media. Ahrefs analyzed 75,000 brands and found that external mentions — especially on YouTube — correlate most strongly with AI visibility.[^4] Yext's study of 6.8 million AI citations confirms this: for local search intent, 86% of cited sources come from channels brands can directly control.[^5] (More on this)

So Ignore llms.txt?

Not necessarily. The effort is minimal and it doesn't hurt. If the wind changes and AI providers start respecting the file, early adopters have an advantage. I've implemented it on martin-moeller.biz myself.

But it's a bet, not a tool. The energy you put into llms.txt is better invested in good content.


The short version

  1. Schema markup helps Google but LLMs can't read it — tokenization destroys the structure
  2. llms.txt has 844,000 implementations but no proven effect on AI citations
  3. Google publicly rejects llms.txt, uses it themselves — no AI provider confirms usage
  4. What works: Clear text, semantic HTML, external validation
  5. Implement llms.txt anyway? Costs nothing, doesn't hurt — but don't expect miracles

Sources & References

[^1]: SE Ranking — llms.txt Study, 300,000 Domains, 2025. [^2]: Google: No AI System Uses llms.txt (SE Roundtable), June 2025. [^3]: Google Adds llms.txt After Dismissing It (Omnius), December 2025. [^4]: Ahrefs: Top Brand Visibility Factors in ChatGPT, AI Mode, and AI Overviews, 2026. Analysis of 75,000 brands: external mentions (especially YouTube) correlate most strongly with AI visibility. [^5]: Yext: AI Citations, User Locations, and Query Context, 2026. Across 6.8 million AI citations, 86% of visible sources for local search intent come from channels brands can directly control.

🌐