DocumentationFundamentals

Durable webhooks: reliable delivery with automatic retries

Make webhook delivery reliable with durable retries. Webhook Relay persists every event and retries failed deliveries with exponential backoff for up to 30 days — so a flaky or offline endpoint never loses an event.

A durable webhook is one that does not get lost when the receiving endpoint is having a bad moment. Instead of a single best-effort POST, the delivery is persisted first and then retried — patiently, with exponential backoff — until your endpoint accepts it or a deadline passes.

This guide explains how durable webhook delivery works, how to turn it on for any destination in Webhook Relay, and how to design a receiver that handles retries safely. For the product overview, see the durable retries feature page.

The timeline below shows what durability buys you: a batch of 50 webhooks to a flaky endpoint, all converging on delivered over time. Press play (or scrub), and switch to Retry attempts to see the retry effort behind the curve.

Durable delivery · live convergence
50/ 50 delivered
100%at 95 min
025500m15m30m45m60m75m90mhandoff1-hour guarantee
DeliveredRetryingAttempts50 webhooks · 275 attempts · 100% by ~73 min

Why webhook delivery fails

Webhooks arrive whenever the sender decides to send them — rarely when your server is at its best. Common reasons a delivery fails:

  • A deploy or restart takes the endpoint down for a few seconds.
  • A database hiccup, a slow query or a dependency timing out returns a 500.
  • A short network blip or DNS failure between the sender and you.
  • The endpoint lives on private infrastructure that was briefly unreachable.

Webhooks are an at-least-once delivery mechanism: senders try, and if they don't get a 2xx back they may try again — or they may not. The retry behaviour you actually get depends entirely on the provider, and most of them give up quickly.

How long do providers retry, really?

ProviderBuilt-in retry behaviour
StripeExponential backoff for up to ~3 days, then the event is dropped.
Shopify~19 attempts over 48 hours, after which the webhook subscription is removed.
GitHubNo automatic retries — failed deliveries must be redelivered by hand.
Most SaaS / internal sendersA handful of attempts over a few minutes, or none at all.

Those windows are short, inconsistent, and impossible to change. Put Webhook Relay in front and every destination gets the same durable safety net regardless of who is sending — up to 30 days of retries — including endpoints on localhost or behind a firewall that the original provider could never reach.

How durable retries work

When durable delivery is enabled on a destination, every webhook goes through the same lifecycle:

  1. Persist first. The moment an event reaches Webhook Relay it is written to durable storage — before any delivery is attempted. It now survives crashes, restarts and deploys on both ends.
  2. Fast retries. Delivery is attempted immediately. Transient blips usually clear within the first few attempts, with no delay you'd notice. Fast retries run for the handoff window (15 minutes by default).
  3. Handoff to durable retry. If the destination is still failing after the handoff window, delivery is handed to the durable retry engine, which keeps trying on your chosen schedule.
  4. Exponential backoff. Each retry waits a little longer than the last, so a struggling server gets room to recover instead of being hammered while it's already down.
  5. Deadline. Retries continue until the event is delivered or the schedule's deadline is reached.

You can watch the whole thing happen: a delivery that is waiting for its next attempt shows up in your logs as stalled, with the time of the next retry, and flips to sent the moment it lands.

Retry schedules

Pick a schedule per destination based on how long you want Webhook Relay to keep trying:

ScheduleTotal windowBest for
Seconds~25 minutesEndpoints that only ever blip — quick, persistent retries.
Medium~16 hoursOutages measured in hours — outage tolerant.
Long~30 daysA destination that might come back next week and still needs its events.

Two extra controls fine-tune the behaviour:

  • Handoff after — how long fast retries run before switching to durable retry (default 15 minutes).
  • Deadline — total time before giving up. Leave it at 0 to use the schedule default (~16 h for medium, ~30 d for long).

Turn on durable delivery

Durable delivery is configured per output destination.

  1. Open your bucket and select the output destination you want to make durable.
  2. In Delivery controls, make sure Retries is enabled.
  3. Open Durable delivery and switch it on.
  4. Choose a Retry schedule (Seconds, Medium or Long), optionally adjust Handoff after and Deadline, and Save.

That's it — new webhooks to that destination are now persisted and retried on the schedule you picked.

Pair durable retries with throttling to control how fast retries reach a recovering server. Durable retries decide how long to keep trying; throttling decides the pace, so you never finish off a server that's only just getting back on its feet.

The configuration is stored on the output and visible through the API as a durability block:

{
  "name": "my-destination",
  "destination": "https://api.example.com/webhooks",
  "durability": {
    "enabled": true,
    "schedule": "medium",
    "handoff_after": 900000000000
  }
}

Design your receiver for retries

Because durable delivery is at-least-once, the same event can legitimately arrive more than once — for example, your endpoint processed the request but the 200 was lost on the way back, so Webhook Relay retries. A correct receiver is idempotent: processing the same event twice has the same effect as processing it once.

The standard pattern is to deduplicate on a stable event id:

// Express example
app.post('/webhooks', async (req, res) => {
  const id = req.body.id // a stable id from the sender

  if (await alreadyProcessed(id)) {
    return res.sendStatus(200) // seen it — acknowledge and move on
  }

  try {
    await handleEvent(req.body)
    await markProcessed(id)
    res.sendStatus(200)        // 2xx => Webhook Relay marks it delivered
  } catch (err) {
    res.sendStatus(500)        // 5xx => Webhook Relay will retry later
  }
})

Guidelines for a retry-friendly endpoint:

  • Return 2xx only when you've safely accepted the event. Any 5xx (or a timeout) tells Webhook Relay to keep the event and retry.
  • Acknowledge fast, work later. If processing is slow, store the event and return 200 immediately, then process out of band — otherwise the request may time out and be retried unnecessarily.
  • Key on the event id, not the payload contents, so retries of the exact same event are recognised.

Watch deliveries converge

Open the bucket's request log to see delivery in real time. Each attempt shows its status:

  • sent — delivered, the endpoint returned 2xx.
  • stalled — failed so far, waiting for its next retry (the next attempt time is shown).

As endpoints recover, stalled deliveries flip to sent and the whole batch converges on successfully delivered — exactly the curve shown in the convergence timeline at the top of this page. No manual reconciliation, no lost events.

A live demo you can run

flakey-script is a small open-source receiver that fails ~80% of the time on purpose, then always succeeds once an event is an hour old. Point a Webhook Relay bucket at it with durable retries enabled and watch every webhook retry and eventually land — even if you turn the receiver off for a while, which durable retries simply treat as another outage to recover from. See the launch walkthrough for the full setup.

Works for public and internal destinations

Durable retries cover both kinds of destination:

  • Public HTTPS endpoints — your API, a partner's API, any SaaS webhook URL.
  • Internal destinations — services behind your firewall or on localhost reached through the Webhook Relay agent, even ones that were offline when the event arrived.

If a destination is unreachable for hours, the events simply wait in durable storage and deliver the moment it comes back.

Frequently asked questions

What is a durable webhook?

A durable webhook is a delivery that is saved to persistent storage before it is attempted and then retried automatically until the receiving endpoint accepts it (or a deadline passes). Unlike a plain best-effort POST, a durable webhook is not lost when the endpoint is briefly down, slow, or returning errors.

How long will Webhook Relay retry a failed webhook?

It depends on the schedule you choose per destination: about 25 minutes (Seconds), about 16 hours (Medium), or up to 30 days (Long). You can also set an explicit deadline.

What is exponential backoff?

Exponential backoff means each retry waits longer than the previous one (for example a few seconds, then minutes, then hours). It gives a struggling endpoint time to recover instead of hammering it with rapid retries while it's already failing.

How do I avoid processing the same webhook twice?

Durable delivery is at-least-once, so make your handler idempotent: deduplicate on a stable event id and skip events you've already processed. Return 2xx once an event is safely accepted, and a 5xx to ask for a retry.

Does this work for endpoints on localhost or a private network?

Yes. Webhook Relay forwards to internal destinations through its agent, and durable retries queue events while a private endpoint is offline, delivering them when it returns.

Did this page help you?