Building Reliable Payment Webhooks: Lessons from Production

Payment webhooks are one of those things that look simple in the integration docs and turn out to be genuinely hard in production. A webhook delivery failure at the wrong moment — a settlement confirmation that never arrives, a failed-payment notification your system never receives — can cascade into reconciliation problems, duplicate payment attempts, and customer disputes that are expensive to resolve. We have seen all of these. Here is what we have learned.

The Fundamental Reliability Problem

HTTP webhooks are fundamentally at-least-once delivery systems. The payment platform sends a notification, waits for an HTTP 200 response, and if it does not receive one within a timeout window, it retries. Your endpoint receives the event and acknowledges it with a 200. Sounds clean.

In practice, the failure scenarios multiply fast:

Your endpoint returns 200 but the processing logic that follows throws an exception — the event is acknowledged but never processed
Your endpoint is temporarily unavailable during a deployment, the webhook retries during your retry window but your retry window is shorter than the backoff schedule — event is dropped
Network conditions cause the 200 response to timeout from the sender's perspective even though you received and processed the event — you process the event, sender retries, you process it again
Your endpoint receives two deliveries of the same event within milliseconds (legitimate race condition) — two reconciliation entries created

Every one of these is a real production scenario, not an edge case. At high transaction volumes — even a few hundred events per day — each of these will happen. Multiple times.

Idempotency Is Not Optional

The most important design principle for payment webhook handlers: every handler must be idempotent. Processing the same event twice must produce the same outcome as processing it once.

The implementation pattern that works:

Extract the event ID from the webhook payload (BackChannel includes a unique event_id in every webhook delivery)
Before processing, check your event log: has this event_id been processed before?
If yes: return 200 immediately, do nothing else
If no: process the event, then write the event_id to your event log atomically with the processing outcome

The atomic write in step 4 is critical. If you process the event and then write the log entry as two separate operations, a failure between them leaves you in a state where the event was processed but will be processed again on retry. Use a database transaction that wraps both operations.

Signature Verification: Do Not Skip It

Every webhook request BackChannel sends includes an HMAC-SHA256 signature in the X-BackChannel-Signature header, computed over the raw request body using your endpoint secret. Verifying this signature before processing the payload is not optional — it is the only reliable way to confirm the event came from BackChannel and not from an attacker who discovered your webhook URL.

The verification is simple but has one common implementation error: the signature is computed over the raw request body bytes, not over the parsed JSON. If you parse the JSON before verifying the signature, the computed hash will not match for requests where the JSON serialization differs from the raw payload (whitespace, key ordering). Always verify against the raw body buffer.

"We have seen integrations that skip signature verification in development and then forget to enable it in production. The first time someone sends a spoofed settlement webhook to trigger an early fund release, the cost is substantially higher than the integration time they saved."
— BackChannel Team

Retry Strategy and Alerting

BackChannel's webhook delivery system uses an exponential backoff retry schedule: immediate delivery, then retries at 1 minute, 5 minutes, 30 minutes, 2 hours, and 24 hours for unacknowledged events. Total retry window: 24 hours.

What this means for your infrastructure: your webhook endpoint needs to be available and returning 200 within 24 hours of event generation, or the event will be marked as permanently failed and require manual redelivery via the API. For settlement confirmations and payment failure notifications, 24 hours is a long time.

The right approach: alert on webhook failures, not just on missed events. Configure your monitoring to alert if your endpoint returns non-200 responses for more than 1% of webhook deliveries over a 10-minute window. That threshold catches real problems before they become the 24-hour dead-letter scenario.

Async Processing Pattern

Return 200 immediately, process asynchronously. Your webhook endpoint should do exactly three things: verify the signature, write the raw payload to a queue, return 200. All the actual processing — updating payment status, triggering downstream workflows, updating reconciliation records — happens asynchronously from the queue.

This decouples your webhook reliability from your application processing reliability. If your payment status update logic has a bug that causes an exception, the webhook has already been acknowledged. The raw payload is in your queue. You can fix the bug and replay events from the queue without needing BackChannel to redeliver.

The queue is your event log and your recovery mechanism. Design it that way from the start.

BackChannel's webhook documentation includes integration examples for Node.js, Python, and Java with idempotency patterns included.

Talk to Our Integration Team

Building Reliable Payment Webhooks: Lessons from Production

The Fundamental Reliability Problem

Idempotency Is Not Optional

Signature Verification: Do Not Skip It

Retry Strategy and Alerting

Async Processing Pattern

Related Articles

Reconciling Multi-Currency Transactions at Scale

How APIs Are Transforming Remittances Across Latin America

Embedded Payments: The Future of B2B Finance in Brazil