Schema vs. LLMs: Why Markup Alone Does Not Help

“The entire point of the LLM approach is to build knowledge from freeform text – not from structured data.” With this remark, Michael Curtis captured what many in SEO circles often overlook: schema markup may help search engines, but it doesn’t help language models.

Still, the opposite claim refuses to die. At conferences and in SEO groups, the idea is repeated: add schema, and you’ll appear more often in AI answers. It sounds plausible - until you examine how LLMs actually work.

Models like ChatGPT or Gemini operate through tokenization. They break text into small units – tokens – and from billions of sequences, learn probabilities of what comes next.

Consider this example:
"@type": "Organization"
During tokenization, it is split into separate tokens for “@”, “type”, and “Organization”. The explicit semantic meaning disappears. To the model, it is just text.

That is why schema cannot play a meaningful role in training. Its value lies in explicitness - and tokenization erases that explicitness.

Independent experiments support this. Two product pages were created for a fictitious item: one with visible text plus schema markup, the other with only schema and otherwise blank. When queried hundreds of times, models like Gemini and ChatGPT could extract prices or SKUs only from the text-based page. The schema-only page remained invisible.

Structured data remains essential for traditional SEO. It helps rankings, which in turn correlate with higher chances of citation in AI answers. But that is correlation, not causation.

The belief that schema markup directly drives AI mentions is a persistent myth — kept alive by repetition in SEO culture, not by evidence.

🌐