Frontier spoke
Retry Safety — does your service survive an agent retrying the same POST?
Agents retry. Network blips, ambiguous responses, partial failures, and tool-loop ambiguity all trigger retries inside an agent's control loop — without idempotency support, each retry can create a duplicate order, a double charge, or a corrupted booking. Retry safety is whether your POST endpoints accept an idempotency key, return structured machine-parseable errors, and recover gracefully when something goes wrong mid-flight.
By Chris Mühlnickel · 2026-05-16
What is Retry Safety?
Retry Safety is whether your service supports an `Idempotency-Key` header (or equivalent) on POST endpoints, returns structured machine-parseable error responses with correct HTTP status codes, and recovers cleanly from partial failures — so an agent retrying the same request never produces a duplicate side effect.
By the numbers
- 24h — Stripe idempotency keys expire after 24 hours — the safe-retry window agents operate inside (Stripe API Reference)
- 0.8% — share of Claude agent actions on public API that appear irreversible — every retry of these can duplicate side effects (Anthropic — Measuring agent autonomy in practice)
- ~50% — of Claude public-API tool calls are software-engineering — high-side-effect actions needing idempotency (Anthropic — Measuring agent autonomy in practice)
Why it matters
Agents retry, and without idempotency every retry is a roll of the dice. This is the cleanest "ship this now" Frontier signal in the framework. The cost of supporting an Idempotency-Key header on POST endpoints is a small server-side change; the cost of not supporting it is duplicate orders, double charges, double bookings, and silent data corruption that surfaces days later as a customer support ticket. Spekto's calibration corpus already has the agent-traffic equivalent of a chargeback: the duplicate-order ticket. Same problem, different surface — and unlike a chargeback there's no network mechanism that automatically reconciles it.
Stripe's `Idempotency-Key` pattern is the de facto standard, and the cost to adopt it is small. Stripe has run idempotency keys on payment endpoints with a 24-hour persistence window since 2017; the pattern is now copied by every payment processor that takes agent traffic seriously. The protocol layer is built on this assumption — Mastercard Agent Pay, Visa Trusted Agent Protocol, and Google AP2 all expect merchant POSTs to be idempotent. If you ship payment endpoints that can't survive an agent retrying the same request, the agent-payment networks route around you.
Agent action surfaces are skewed toward high-side-effect tool calls. Anthropic's public-API data shows that about half of Claude's tool-call volume is software-engineering — file writes, deploys, infrastructure mutations — and 0.8% of agent actions appear irreversible. Both numbers say the same thing: the POST surface is where the damage compounds. The 0.8% is small, but multiply it by the volume of agent actions a single user session generates and you get a meaningful absolute count of irreversible side effects that need protection. Idempotency-on-receive plus structured errors is the cheapest insurance against the worst tail.
Idempotency is a reusable engineering primitive — not an agent-only feature. The work you do for agent retries also protects against your own client-side retry stack, your CDN's retry logic, intermittent network failures, and humans clicking submit twice on a slow checkout. It composes with circuit breakers, with at-least-once message queues, with eventual-consistency replication. The investment is durable in a way most agent-readiness work isn't.
Multi-agent concurrent action is the new default, not an edge case. A user can run Anthropic Computer Use and a Claude tool-call SDK pipeline simultaneously. A consumer app can trigger an agent retry while the user is themselves retrying via the UI. Two A2A agents can land on the same flow concurrently. Sites built for "one user, one tab, one click" silently break under any of these; sites with idempotency keys absorb all of them without duplicate side effects.
Where it's heading
Promotion from Frontier watcher to scored Usability parameter. Retry Safety is the most likely Frontier watcher in the framework to make this jump. The trigger condition is agent-driven POST volume crossing a 5-10% threshold on a representative sample — at that point retry-safety stops being a "watch" and becomes a scored Usability check, very likely with a power-cap mechanic (sites without it can't earn an A in Usability). Plan accordingly; ship idempotency support now, not when the cap activates.
IETF `Idempotency-Key` draft promoting to RFC. The IETF HTTPAPI working group's idempotency-key header draft has been iterating since 2021 and is now at draft-07 (October 2025). Standardization removes the ambiguity around persistence windows, scope (per-account vs per-endpoint), and conflict semantics — once the RFC ships, agent SDKs can default to the standard behavior instead of falling back to per-vendor conventions.
Action-layer protocols emerging. Today, "is this action a duplicate?" is answered per-endpoint via idempotency keys. The next layer — multi-agent coordination (when two agents act on the same flow), action confirmation (the agent asks the user to confirm a high-stakes action before submitting), partial-failure recovery (the agent rolls back its own intermediate writes when something goes wrong) — is in active development across the same vendor coalition that ships MCP and A2A. Idempotency-on-receive is the precondition; the higher layers build on top.
"Self-healing UI" patterns surfacing retry state to the user. Stripe Checkout already does this — detect that a previous request is in flight, show "we're processing your previous payment, please wait..." instead of letting the user resubmit and risk a duplicate. The pattern generalizes: any POST surface that costs real money, real inventory, or real human time should surface in-flight state and rate-limit user retries from the client side as well. Agent retries plus human retries plus CDN retries are the new failure mode; the UX has to catch up.
Common mistakes
- No idempotency on POST endpoints. The single most common failure mode in Spekto's calibration corpus. The visible symptom is customer-support tickets about duplicate orders or double charges; the invisible symptom is agent platforms routing around you when they detect non-idempotent merchant behavior.
- Generic error messages that give the agent no signal.
{"error": "Something went wrong"}with a 200 status code forces the agent to choose blindly between abandoning the user's action and risking a duplicate. Use correct HTTP status codes plus a stable machine-parseable error code in the body. - Retry loops that compound across layers. Your own client-side retry, your CDN's automatic retry on 5xx, the agent's SDK retry, and the user clicking submit twice all stack. Without server-side idempotency, every layer multiplies the duplication risk.
- Persisting idempotency keys for too short a window. A 30-minute window catches the common retry cases but misses the long-tail: agents that pause for human approval, queued retry-on-recovery, async webhook reconciliation. 24 hours is the Stripe standard for a reason.
- Treating 5xx errors as non-idempotency-eligible. Some implementations only return the cached response for 2xx responses, and re-execute on 5xx. This is backwards — 5xx is exactly when the client doesn't know if the action committed, and exactly when idempotency matters most. Cache and replay the 5xx response, every time.
- Mixing idempotency-key scope. Some servers scope the key per-account, some per-endpoint, some globally. Without a documented scope, agents guess wrong and either fail to retry safely or collide with unrelated requests. Document the scope explicitly in your API reference; align with Stripe's per-account scoping unless there's a strong reason not to.
- Skipping idempotency on non-payment POSTs. Bookings, message sends, document creation, support tickets, calendar invites — all of these break under retry without idempotency. The work scales linearly with endpoint count; do it across the whole POST surface, not just
POST /payments.
Frequently asked
What is idempotency and why do agents care?
Idempotency is the property that performing an operation more than once produces the same result as performing it once. Agents retry — every modern agent loop has retry logic on network errors, 5xx responses, ambiguous tool-call outputs, and partial failures. Without idempotency support on the server side, the second retry creates a duplicate order, a double charge, or a second booking for the same slot. Idempotency turns those retries into safe no-ops.
How do I support `Idempotency-Key` headers correctly?
Three rules: (1) accept the `Idempotency-Key` header on every POST endpoint that has a side effect — GET and DELETE are already idempotent by HTTP semantics. (2) Persist the key alongside the response status code and body for at least 24 hours, the Stripe-standard window. (3) Return the same response for the same key — including for 5xx errors — so a client retry never double-executes the action. The implementation is small; the failure mode without it is duplicate-write tickets days later.
Is idempotency only for payments?
No. Payment endpoints are the highest-stakes surface — duplicate charges are visible money loss — but the same primitive protects every POST that mutates state. Bookings, order placement, message sending, document creation, support-ticket submission, calendar invites: all of these break under retry without idempotency. The work you do for payment endpoints is reusable across the whole POST surface.
What does a structured error response look like?
Correct HTTP status code (4xx for client errors, 5xx for server errors — never 200 with {"error": ...}), plus a JSON body with a stable machine-parseable error code (payment.card_declined, inventory.out_of_stock), a human-readable message, and any retry hints the client needs (retryable: true, retry_after: 30). Generic {"error": "Something went wrong"} gives the agent no signal — it has to choose blindly between abandoning the user's action and risking a duplicate.
Do agents retry idempotently on their own?
The good ones try. Anthropic's SDK retries 5xx and rate-limit responses by default; OpenAI's SDK does the same. Both pass through any Idempotency-Key the application sets, but neither generates the key themselves — that's the application developer's job. Computer-Use Agents and browser-based agents typically don't set the header at all, which is exactly why server-side idempotency-on-receive is the load-bearing protection.
How is this different from client-side retry logic I already have?
Client-side retry logic triggers retries; idempotency-on-receive makes them safe. The two compose: your own application retries, your CDN's retries, the agent's retries, the user clicking submit twice — they all stack. Without server-side idempotency, you compound the duplication risk; with it, every layer is safe to retry as aggressively as it wants.
Will Spekto score this in a future framework version?
Retry Safety is the most likely Frontier watcher in the framework to promote to a scored Usability parameter, likely when agent-driven POST volume crosses a 5-10% threshold on a representative sample. Sites that ship idempotency support now will not take a power-cap hit when the promotion happens; sites that don't will.