Sub-grade spoke

Bot Blocking Detection — does your CDN actually let AI agents reach your origin?

Your robots.txt can be permissive while your CDN silently rejects every AI agent at the edge. Cloudflare's bot-fight mode, AWS Shield, and Akamai's bot manager all ship with defaults that catch legitimate AI crawlers. The failure is invisible — your origin never sees the request — and it's the most common cause of 'we should be appearing in ChatGPT but we're not.'

By Chris Mühlnickel · 2026-05-16

What is Bot Blocking Detection?

Bot Blocking Detection is whether your CDN, WAF, or edge-layer security stack actually permits AI agent requests through to your origin, beyond what robots.txt declares as policy.

By the numbers

39% — of top 1M Cloudflare sites are accessed by AI bots, but only 2.98% block them. (Cloudflare blog — Declaring your AIndependence)
~20% — of the public web flipped to default-block-all-AI-crawlers when Cloudflare changed its July 2025 default. (Cloudflare blog — Control content use for AI training)
416B — AI scraping requests denied by Cloudflare between July and December 2025 — half a year. (Cloudflare Radar 2025 Year in Review)

Why it matters

The robots.txt → CDN gap is where most Visibility failures actually happen. A site can have a perfect robots.txt — every major AI user-agent allowed, paths scoped correctly — and still be invisible to ChatGPT, Claude, and Perplexity because the request never reaches the origin server. CDN-layer blocking is silent: no log entry at your origin, no obvious diagnostic in your analytics, just a slow drop in citations that takes months to attribute. Cloudflare's own data captures the shape of the problem: 39% of top-1M sites are accessed by AI bots, but only 2.98% deliberately block them — most of the rest of the gap is accidental, not policy.

[Bot-fight mode](/learn/glossary#term-bot-fight-mode) is on by default, and it catches modern AI agents. Cloudflare's bot-fight feature defaults to conservative behavior — anything bot-shaped gets challenged or blocked. It was designed for spam scrapers; it predates the agent traffic boom. The default config catches ChatGPT-User, Claude-User, and PerplexityBot indiscriminately. Sites that activated bot-fight before 2024 and never revisited it are blocking traffic they want, and Cloudflare's July 2025 policy change — flipping ~20% of the public web to default-block-all-AI-crawlers — silently widened the gap further for any account that didn't explicitly opt out.

The failure pattern is invisible from your own dashboards. Your origin server's logs don't show requests that never arrived. Your Google Analytics doesn't show agents that bounced at the edge. Your AI Overview citation rate just slowly... isn't what it should be. This is why Bot Blocking Detection is a separate parameter from Bot Access Policy — robots.txt you can audit by reading the file; CDN behavior you can only audit by probing. The 416B AI scraping requests Cloudflare blocked across H2 2025 give the scale; the harder number is how much of that volume was legitimate citation-time traffic the site owner wanted through.

[WAF](/learn/glossary#term-waf) rules accumulate; nobody removes them. Most teams add WAF rules during incidents — "Block this UA, we were getting spammed" — and never review them. Three years of accumulated rules plus an agent traffic surge equals a quiet decimation of your AI surface. The cleanup is usually one config change away, but only if someone runs the audit. Bot Blocking Detection is the Power parameter in Visibility because a single bad rule can cap the entire sub-grade — no amount of permissive robots.txt rescues a CDN that refuses the request.

Where it's heading

AI bot allowlists become the default policy. Cloudflare, AWS Shield, and Akamai bot manager are all shipping Allow-verified-AI-bots as a one-click default rather than a custom config. By 2027 this is likely the shipped default for new accounts, reversing the current "block first, allow later" pattern that produced the gap in the first place.

Per-vendor agent identity verification. HTTP Message Signatures — Visa's Trusted Agent Protocol, Cloudflare's Web Bot Auth — let CDNs verify an agent's claimed identity cryptographically. Once standardized, "allow verified Claude-User" becomes a hard verification rather than a UA-string check, which closes the spoofing failure mode that justified aggressive bot-fight defaults in the first place.

AI Overview citation rate becomes a CDN diagnostic. Once tools surface "your CDN policy is reducing your AI Overview citations by X%", the conversation flips from "should we allow AI agents?" to "how much revenue is the current policy costing us?" — and the procurement-team default flips with it.

Common mistakes

Trusting robots.txt to do the CDN's job. A permissive robots.txt is necessary but not sufficient. The CDN sits above robots.txt in the request path; its policy wins, every time.
Activating bot-fight mode without scoping it. The blanket-default is too aggressive for content-driven sites. Scoping to admin and auth paths preserves the security value without nuking citations.
Adding WAF rules during incidents and never reviewing them. Each rule has a half-life of usefulness; the WAF doesn't tell you when a rule's value has expired, and orphaned rules quietly cost citations.
Applying [Cloudflare Turnstile](/learn/glossary#term-cloudflare-turnstile) to every form. Most agents can't pass; legitimate user flows get gated too. Turnstile belongs on signup and high-risk endpoints, not universally.
Diagnosing 'we're not in ChatGPT' without checking the CDN first. Most teams start by adjusting content or schema; the bottleneck is upstream. Probe the edge layer before optimizing anything downstream.

Frequently asked

My robots.txt allows GPTBot. Why aren't we appearing in ChatGPT?

Almost always Bot Blocking Detection. Run a curl test with the GPTBot UA — if you get anything other than 200, your CDN is blocking the request before it reaches your origin. The robots.txt policy is a request to the bot; the CDN policy is a hard wall. Run a Spekto audit to confirm where the block lives.

How do I know if Cloudflare bot-fight mode is on?

Cloudflare dashboard → Security → Bots → look at the Bot Fight Mode toggle and Super Bot Fight Mode toggle states. Both default to On for new accounts, and many older accounts have them on without anyone remembering. Cloudflare's July 2025 policy change additionally flipped the default to block all AI crawlers across new zones — so a fresh setup blocks AI agents twice over unless you reverse both.

Should I disable bot-fight entirely or just scope it?

Scope it. Bot-fight provides genuine value for spam protection on auth and checkout paths. The right pattern is to turn it off on content paths (/, /blog/*, /products/*, /learn/*) while keeping it on for admin, checkout, and api paths. Pairing path-scoping with the Allow-AI-bots verified-agent toggle is the cleanest Cloudflare configuration in 2026.

Does the Allow verified AI bots toggle actually work?

Yes, and it's the cleanest fix for Cloudflare-hosted sites. It explicitly allows verified AI UAs (ChatGPT-User, Claude-User, PerplexityBot, etc.) through bot-fight. Caveat: verified means Cloudflare has confirmed the UA's IP belongs to the claimed vendor — fake UA strings still get caught, which is the intended behavior.

My WAF rules are too tangled to audit. Where do I start?

Start with the most recent 6 months of activity. Rules added in the last 6 months represent your active threat profile; older rules are usually orphaned. Pull the WAF event log, filter to rules that have actually triggered in the last 90 days, and consider deleting the rest. Pair this with a Spekto scan to confirm what the cleanup unblocks.

What about Imperva, Fastly, Sucuri, and other CDN vendors?

Same logic, different config surface. All major WAF and CDN vendors ship bot management — the policies are vendor-specific but the failure modes are consistent. Run the curl probe regardless of vendor; the symptom (request never reaches origin, no log line, no citations) is the same everywhere.

Why is this a separate parameter from Bot Access Policy?

Because robots.txt and CDN policy fail in different ways and have different fixes. Bot Access Policy is a file you can read and edit yourself; Bot Blocking Detection lives in a vendor console you have to log into. Both have to be correct for an agent to reach your page. Scoring them separately is what makes the silent-CDN-block case visible at all.