Controlling AI Understanding with Schema and llms.txt

In SEO circles, you keep hearing it: "Add schema markup to your site and ChatGPT will love you!" Sounds logical — schema helps Google, so it must help AI too. Right?

Then there's llms.txt — a plain text file that's supposed to tell AI models what matters on your site. Over 844,000 websites have implemented it. Google publicly dismissed it, then quietly published their own.

Both approaches promise you can control how AI understands your brand. Reality is more complicated.

Yellowed llms.txt note beside clear printed body text on paper — The note nobody reads.

Schema Markup: Why AI Ignores Your Code

Michael Curtis put it well: "The entire point of the LLM approach is to build knowledge from freeform text — not from structured data." LLMs don't need structure. They're built to understand chaos.

How does AI learn? It chops text into tiny fragments — tokens. When you write clean schema markup:

"@type": "Organization"

...the AI doesn't see structure. It sees fragments: "@", "type", "Organization". The semantic logic you carefully built evaporates during tokenization.

The Proof

Researchers built two product pages: one with normal text plus schema, one with only schema and otherwise blank. Across hundreds of queries, Gemini and ChatGPT could extract prices and SKUs only from the text-based page. The schema-only page was invisible.

So Delete Schema?

Absolutely not. Schema is still essential for traditional Google search — rich snippets, price displays, Knowledge Graph. And ranking well on Google correlates with higher chances of being cited by AI. But that's correlation, not causation.

The belief that schema markup directly drives AI mentions is a persistent myth — kept alive by repetition in SEO culture, not by evidence.

llms.txt: The Signpost Nobody Reads

The idea is compelling. Imagine you have a janitor (the AI) who needs to organize your house (your website). Until now, you let them in and hoped they'd find the living room instead of getting stuck in the basement. robots.txt was just the "No Entry" sign on certain doors.

llms.txt is a friendly note at the entrance: "Hey, the important stuff is on the table in the living room. Ignore the clutter in the hallway."

Jeremy Howard from Answer.AI proposed this in September 2024. The file defines:

/llms.txt — curated overview in Markdown with key links
/llms-full.txt — complete content in a single file

Sounds simple, takes 10 minutes. Over 844,000 websites have implemented it. Anthropic, Cloudflare, Stripe, Vercel, Perplexity, Hugging Face — they all have one.

The Data Says: No Effect

Then came the reality check. SE Ranking analyzed 300,000 domains. The result: statistically no measurable effect of llms.txt on AI citations. Removing the llms.txt feature actually improved their model's accuracy — the file introduced noise.¹

Google spokesperson John Mueller compared llms.txt to the keywords meta tag in June 2025 — which was abandoned due to manipulability. His argument: bots need to verify original content anyway. llms.txt is redundant and opens cloaking risks.²

And then the irony: Google published their own llms.txt on the Search Central Developer portal in December 2025.³ Publicly against it, quietly adopting it.

Strategic Silence

Not a single major AI provider has officially confirmed using llms.txt. Anthropic publishes their own but hasn't said Claude references it during conversations. OpenAI is silent. GPTBot doesn't fetch llms.txt files according to analyses.

Aspect	robots.txt	sitemap.xml	llms.txt
Purpose	Access control	URL discovery	Content curation for LLMs
Format	Plaintext	XML	Markdown
Authority	De facto standard (RFC 9309)	De facto standard	Proposal, no standard
Adoption	~universal	~universal	~10%
Proven effect	Yes	Yes	No

What Actually Works

Schema markup doesn't move the needle. Neither does llms.txt. What does?

Good text.

A visible table on your website gives AI more than 50 lines of JSON-LD in your source code. A clear heading structure beats any llms.txt file. Because LLMs learn from what humans can read — and that's what needs to be good.

Three things matter:

Clear, parseable content. Tables, lists, fact sections. AI can't extract facts from prose with 100% certainty — and leaves uncertain information out. (More on this in the GEO Playbook)
Semantic HTML. Tags like <article>, <section>, <header> aren't decoration. They're the map for crawlers and AI. Div soup increases token costs and reduces crawl frequency.
External validation. AI doesn't just read your site. It reads Reddit, podcasts, trade media. Ahrefs analyzed 75,000 brands and found that external mentions — especially on YouTube — correlate most strongly with AI visibility.⁴ Yext's study of 6.8 million AI citations confirms this: for local search intent, 86% of cited sources come from channels brands can directly control.⁵ (More on this)

A practical example of this approach is AußenBlick GKV as a study on content governance for statutory health insurers: the review frame works with visible passages, evidence trails and statement boundaries instead of metadata alone.

So Ignore llms.txt?

Not necessarily. The effort is minimal and it doesn't hurt. If the wind changes and AI providers start respecting the file, early adopters have an advantage. I've implemented it on martin-moeller.biz myself.

Treat it as a bet, not a tool. The energy you put into llms.txt is better invested in the three things above.

The short version

Schema markup helps Google but LLMs can't read it — tokenization destroys the structure
llms.txt has 844,000 implementations but no proven effect on AI citations
Google publicly rejects llms.txt, uses it themselves — no AI provider confirms usage
What works: Clear text, semantic HTML, external validation
Implement llms.txt anyway? Costs nothing, doesn't hurt — but don't expect miracles