Sub-grade

Clarity — will AI agents choose your site?

Q: What's the minimum Schema.org markup my site needs?

Three types cover most cases: `Organization` (your business — name, URL, logo, sameAs links to social and registries, address) on every page or in your site-level config; `Article` or `BlogPosting` on content pages (with author, dateModified, dateCreated); and the type matching your specific page (Product for e-commerce, Service for service-local, JobPosting for careers, Course for education, FAQPage for FAQ sections). For e-commerce, add `AggregateRating` and `Review` to Product. Use Google's Rich Results Test to validate.

Clarity is the AAIO sub-grade covering whether agents can understand what you offer once they've fetched your pages. Schema.org markup, content extractability, identity, pricing transparency, reviews, payment-trust signals, return policy, and llms.txt content quality. Agents pick the option whose answer is most legibly machine-readable. 14 scored parameters.

By Chris Mühlnickel · 2026-05-04

What is Clarity?

Clarity (in AAIO) is the set of comprehension-layer signals — structured data, extractable text, identity verification, comparability — that determine whether AI agents can confidently choose your site as the answer.

By the numbers

38% — of AI Overview citations come from top-10-ranked pages — down from ~76% in July 2025. (Ahrefs Brand Radar (n=863K SERPs / 4M AI Overview URLs))
82% — higher CTR on Nestlé pages that show as rich results vs. non-rich pages — a Google-published case study. (Google Search Central — Structured data intro)
41% — of pages now use JSON-LD structured data, up from 34% in 2022 — the fastest-growing structured-data format. (HTTP Archive Web Almanac 2024 — Structured Data)

Why it matters

Agents resolve ambiguity by preferring structured data over free text. This is the 4-tier rubric the Spekto framework uses for every content-extraction parameter: structured data > visible text > gated content > absent. The gap between Tier 1 and Tier 2 is large. An agent comparing three sites for a product will pick the site with valid Schema.org/Product over the site with the same content in HTML, even if the HTML site's content is objectively better — because the structured-data site is unambiguous and the text site requires natural-language reasoning that can fail. Reliable retrieval beats high-quality retrieval. That's the entire ballgame at the Clarity layer.

Many of the signals here matter for traditional SEO too. Schema.org markup powers Google's Rich Results, Knowledge Panels, FAQ blocks, and AI Overview citations. Author bylines + last-updated dates power E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) signals. Pricing transparency and review schema improve product-search ranking. Clarity is mostly cumulative work — investing here serves multiple distribution channels at once.

Identity signals increasingly determine whether you're a primary source vs. a secondary mention. When ChatGPT, Claude, or Gemini cites a fact about your company, the citation goes to the canonical source — usually whoever has the strongest Organization schema with sameAs links to authoritative registries, the cleanest NAP consistency across the web, and the most stable identity-verification surface. Sites that publish their identity legibly get cited as the source; sites that don't get cited as a derivative mention or not at all.

Pricing transparency is now an agent-conversion signal — a hidden price = a skipped option. The 3% of e-commerce sites in our calibration corpus that expose machine-readable pricing are the only ones that are comparable to alternatives in agent-mediated shopping. The 97% that don't are not just less convenient; they're effectively invisible at the comparison step. The same applies to SaaS pricing: a "request a demo for pricing" page is a routing dead end for an agent that's trying to surface options for a user.

Reviews and ratings markup is the highest-leverage Clarity work for e-commerce; for SaaS it's pricing + trial clarity. The categorical mistake is putting reviews in a JS-only widget that doesn't SSR — the user sees five-star ratings, agents see nothing. Stripe's published research, Google's review snippets documentation, and our own calibration data all converge on the same finding: structured reviews disproportionately improve citation rates.

Sub-topics

Foundational (apply to most sites)

C-SCH Schema Coverage + Validity — Valid `Organization`, page-type Schema.org markup, JSON-LD, schema-validator clean.
C-TXT Content Extractability — Body text server-rendered, no JS-only critical content, semantic HTML hierarchy.
C-LLMS llms.txt presence + quality — Markdown manifest at site root with site description and key URLs.
C-BIZ Business Identity — `Organization` schema with `sameAs` links, NAP consistency, verified-domain signals.
C-PRC Pricing + Trial Clarity — Machine-readable pricing or transparent trial / freemium signals.

E-commerce-specific

C-FEED Product Feed — Google Merchant Center / agent-callable product feed.
C-AVA Stock & Availability Signals — Structured availability per Schema.org.
C-RVW Review & Rating Markup — `AggregateRating` + `Review` schema.
C-SHIP Shipping — Structured shipping policy + lead times.
C-RRP Return & Refund Policy — Structured return-policy markup or extractable text.
C-PAY Trusted Payment Signals — Payment-method markup, SSL trust badges, buyer-protection cues.
C-OPS Operational Signals — Structured operating hours, status pages, service availability.

Trust / vetting

C-REP Vendor Reputation — Third-party trust signals (BBB, Trustpilot, registered-entity verification).
C-GSB Google Safe Browsing — Site is not flagged as malicious.

Where it's heading

Agent-callable feeds beyond Product. Schema.org/Product is well-adopted for e-commerce; the next surfaces — Service, JobPosting, Course, Event, LocalBusiness — are where agents are extending their action repertoires. Sites in those verticals that ship structured data early get the same kind of routing premium e-commerce sites with Product schema get today.

"Agent-readable answer" sections becoming a declarative pattern. The pattern emerging in our corpus: a small Schema.org block near the top of the page summarizing the page's primary answer in structured form — not as decoration but as the canonical machine-parseable version of what the human-readable hero says. Stripe Atlas guides do this; MDN does this; the pattern compounds with NLWeb (which builds on Schema.org).

Identity verification getting capability-level. Today, business identity is binary: verified or not. The next layer is what is this entity verified to do? — a Stripe-verified merchant has different credentials than a SOC2-verified SaaS than an AppExchange-verified Salesforce ISV. Agent-routing decisions will use these capability-level identity signals to decide which actions to delegate.

LLM-readable inline summaries replacing the meta-description for AI search. Google's meta-description is shrinking in importance; what's growing is the first-paragraph TL;DR + Schema.org description + llms.txt summary triplet. Sites that maintain all three consistently get cleaner extraction.

Common mistakes

Marking up only the homepage. Schema.org should appear on every page that represents a distinct entity — every product, every article, every service. Homepage-only markup is the most common audit failure in our corpus.
Hand-rolling JSON-LD without validation. Errors are silent — Google Rich Results Test will tell you, but many sites never run it. The error is the same as not having markup at all, but feels worse because the team thought they'd shipped it.
"Pricing-on-request" pages. Each one is a Clarity dead end. If your sales motion requires it, fine — but accept that those pages won't be cited or routed to.
Reviews on a JS-only widget that doesn't SSR. Visible to humans, invisible to agents. The inverse is acceptable (server-rendered reviews + JS enhancement); the forward direction is the failure mode.
Free-text trust signals without schema backup. "We have 50,000 happy customers" in a hero — agents discount un-structured claims. Pair with AggregateRating schema + verifiable review sources.

Frequently asked

What's the minimum Schema.org markup my site needs?

Three types cover most cases: Organization (your business — name, URL, logo, sameAs links to social and registries, address) on every page or in your site-level config; Article or BlogPosting on content pages (with author, dateModified, dateCreated); and the type matching your specific page (Product for e-commerce, Service for service-local, JobPosting for careers, Course for education, FAQPage for FAQ sections). For e-commerce, add AggregateRating and Review to Product. Use Google's Rich Results Test to validate.

Is JSON-LD better than microdata or RDFa for agents?

Yes. JSON-LD is what Google explicitly recommends, what most AI agents prefer to parse, and what's easiest to maintain (it lives in a <script> block, separate from your visible markup). Microdata and RDFa intermix structured data with HTML attributes, which means refactoring your visible markup risks breaking your structured data. JSON-LD is decoupled.

What if my CMS doesn't support Schema.org natively?

Two options. (1) Use a Schema.org plugin or extension — most major CMSes (WordPress, Webflow, Shopify, Drupal) have well-maintained ones. (2) Hand-roll JSON-LD blocks in your templates. Neither is hard. The harder problem is keeping the structured data in sync with your content as it changes — automated generation from your CMS data is far better than hand-edited blocks that drift.

Do agents use review schema even when there's no rich snippet shown?

Yes. AI Overviews, ChatGPT, and Claude consume review/rating schema directly to compare options — even when the SERP doesn't render a rich-result star block. Sites with structured reviews get cited preferentially over sites with reviews-as-text. Spekto's calibration corpus is striking here: 90% of e-commerce sites lack review schema, which means review-based comparison routes around them entirely.

How do agents verify business identity? Is Google Business Profile enough?

Multiple signals compound. Google Business Profile is one input — useful for local search, weak for global agent routing. Stronger signals: Organization schema with sameAs links to LinkedIn, Crunchbase, Wikipedia, and your social profiles; consistent NAP (name/address/phone) across the web; HTTPS + canonical URL; verified domain ownership; legal-entity registration (DUNS, registered company number) where applicable. For B2B, marketplace verification (AppExchange, AppSource) compounds further. The agent-trust calculation is multi-input.

What goes in llms.txt vs. robots.txt vs. sitemap.xml?

robots.txt is policy (which user-agents may fetch what). sitemap.xml is the URL inventory (what pages exist). llms.txt is the map for LLMs — a prose-and-link Markdown file that says 'here's what this site is and where the important stuff is.' robots.txt and sitemap.xml are well-established standards; llms.txt is emerging convention. They're complementary, not redundant.

Why do agents prefer structured data over visible text?

Reliability. Visible text requires the LLM to extract facts via natural-language reasoning, which can fail or hallucinate; structured data is unambiguous lookup. The Spekto framework uses a 4-tier rubric for content-extraction params: structured data > visible text > gated > absent. The gap between Tier 1 (structured) and Tier 2 (visible text) is large — agents pick structured-data sources over text-only sources at high rates, even when the text-only source has objectively better content.