Webhook Retries and Idempotency: A Practical Guide

Why webhook deliveries fail, how providers retry with exponential backoff, why duplicate events happen, and how to build idempotent handlers that dedupe on event ID and survive at-least-once delivery.

Webhooks fail. Not often, but often enough that "fire it once and hope" is not a delivery strategy. A request times out, a deploy restarts your service for ten seconds, a load balancer hiccups — and the event is gone unless someone retries it. That's why every serious webhook provider retries, and why every serious receiver has to be idempotent.

This guide covers the two halves of reliable webhook delivery: how retries work on the sending side, and how idempotency protects you on the receiving side. If you only learn one thing, make it this — design your handler so processing the same event twice is harmless.

Why deliveries fail

A webhook is just an HTTP POST from a machine you don't control to a machine the sender can't see inside. Plenty can go wrong:

  • Timeouts. Your handler does slow work inline (charging a card, calling another API) and blows past the provider's timeout — often just a few seconds.
  • Transient errors. A 502/503 from your proxy during a deploy, a database that's briefly unreachable, a pod being rescheduled.
  • Network drops. The connection dies after you processed the event but before the provider got your 200. The provider assumes failure.
  • Bad responses. Returning a non-2xx for any reason — including an uncaught exception — tells the provider to retry.

The takeaway: failure is normal, and a chunk of "failures" are actually successes the sender never heard about. That's the root cause of duplicates.

How providers retry: exponential backoff

When a delivery fails, providers don't hammer your endpoint — they back off. Exponential backoff roughly doubles the wait between attempts so a struggling endpoint gets room to recover:

attempt 1 → fail → wait ~10s
attempt 2 → fail → wait ~30s
attempt 3 → fail → wait ~2m
attempt 4 → fail → wait ~10m
attempt 5 → fail → wait ~1h
...up to hours or days, often with jitter

The exact schedule varies — Stripe retries for up to ~3 days, GitHub retries a handful of times, others differ — but the pattern is the same: increasing delays, capped attempts, then the provider gives up. Many add jitter (a small random offset) so a fleet of retries doesn't synchronize into a thundering herd.

What "success" means to the sender is almost always the same: a 2xx status, returned quickly. Anything else is a retry. This is why the cardinal rule of webhook handlers is acknowledge fast, work later — return 200 as soon as you've durably stored the event, then do the heavy lifting in a background job.

At-least-once delivery (and why exactly-once is a myth)

Because the network can drop the acknowledgement, no provider can promise exactly-once delivery over plain HTTP. What they offer is at-least-once: the event is delivered one or more times, and it's your job to make repeats harmless. Treat "I might see this event again" as a guarantee, not an edge case.

Why duplicates happen — concretely

You'll get the same event twice when:

  1. Your handler succeeds but responds too slowly, so the provider's timeout fires and it retries.
  2. The connection drops after processing but before the 2xx reaches the sender.
  3. You return a 500 from a non-critical bug after the important side effect already ran.

In every case the side effect (an order created, an email sent, a balance updated) happened once but the event arrives twice. Without protection, you double-charge, double-ship, or double-notify.

Making handlers idempotent

Idempotency means processing an event N times has the same effect as processing it once. The pattern is short:

  1. Find a stable unique ID on the event. Most providers send one — Stripe's id (evt_...), GitHub's X-GitHub-Delivery, etc. Use the provider's ID, not a hash of the body, since payloads can vary.
  2. Record it before acting. Insert the ID into a processed_events table with a unique constraint.
  3. Skip if seen. If the insert hits the constraint, you've already handled this event — return 200 and stop.
-- one row per event, the unique index does the dedupe
CREATE TABLE processed_events (
  event_id   TEXT PRIMARY KEY,
  created_at TIMESTAMPTZ DEFAULT now()
);
def handle(event):
    try:
        db.execute(
            "INSERT INTO processed_events (event_id) VALUES (%s)",
            [event["id"]],
        )
    except UniqueViolation:
        return 200, "already processed"   # duplicate — no-op

    do_the_work(event)   # safe: runs at most once per event id
    return 200, "ok"

The critical detail: the dedupe check and the side effect must be atomic. If you check-then-act in two steps, two concurrent retries can both pass the check. Wrap them in one transaction, or rely on the database's unique constraint to be the gatekeeper.

Idempotency keys for outbound calls

When your handler calls another API (charging a card, creating an order), pass an idempotency key — a value derived from the webhook's event ID — so the downstream service also dedupes. Stripe, for example, accepts an Idempotency-Key header and will return the original result instead of charging twice. Now the whole chain is safe end to end.

Dead-letter patterns

Retries don't last forever. After a provider exhausts its attempts (or your own forwarder does), the event is effectively lost unless you've captured it. A dead-letter queue (DLQ) is where you park deliveries that never succeeded so a human or a job can replay them later. Good practice:

  • Persist the raw payload the moment it arrives, before any processing.
  • Route permanently-failing events to a DLQ with the error and attempt count.
  • Build a one-click replay so you can re-send a fixed batch once the bug is patched.

How Webhook Relay helps

A managed forwarder takes a lot of this off your plate. Webhook Relay retries failed deliveries with backoff, so a brief blip in your service doesn't drop the event — it's redelivered when you recover. Because it captures every request, you can replay a payload against your handler as many times as you need (perfect for exercising your idempotency logic without re-triggering the source event). You can also filter and route only the events you care about, and transform them in flight — for example, normalizing the event ID into a header your handler dedupes on.

You still own idempotency on your side — no forwarder can make a non-idempotent handler safe — but combining provider retries, a forwarder that retries, and a handler that dedupes gives you genuinely reliable delivery.

The short version

  • Deliveries fail for boring, frequent reasons; retries with exponential backoff are the cure.
  • Delivery is at-least-once, so duplicates are guaranteed, not rare.
  • Make handlers idempotent: dedupe on the provider's event ID, do it atomically, and pass idempotency keys to downstream APIs.
  • Capture raw payloads and keep a dead-letter / replay path for what slips through.

Want to see this in action? Inspect and replay real payloads in Webhook Bin, read what a webhook is, or create a free account to forward and retry webhooks against your own code.