Why LLMs don’t use schema

“The entire point of the LLM approach is to build knowledge from freeform text – not from structured data.” With this remark, Michael Curtis captured what many in SEO circles often overlook: schema markup may help search engines, but it doesn’t help language models.

Still, the opposite claim refuses to die. At conferences and in SEO groups, the idea is repeated: add schema, and you’ll appear more often in AI answers. It sounds plausible - until you examine how LLMs actually work.

Models like ChatGPT or Gemini operate through tokenization. They break text into small units – tokens – and from billions of sequences, learn probabilities of what comes next.

Consider this example:
"@type": "Organization"
During tokenization, it is split into separate tokens for “@”, “type”, and “Organization”. The explicit semantic meaning disappears. To the model, it is just text.

That is why schema cannot play a meaningful role in training. Its value lies in explicitness - and tokenization erases that explicitness.

Independent experiments support this. Two product pages were created for a fictitious item: one with visible text plus schema markup, the other with only schema and otherwise blank. When queried hundreds of times, models like Gemini and ChatGPT could extract prices or SKUs only from the text-based page. The schema-only page remained invisible.

The takeaway: structured data remains essential for traditional SEO. It helps rankings, which in turn correlate with higher chances of citation in AI answers. But this is correlation, not causation.

The belief that schema markup directly drives AI mentions is less a technical truth than a persistent myth, carried forward in SEO culture more by repetition than by evidence.