Webhook Architecture & Design — SKILL.md
Raw skill file that agents receive when using this skill
---
name: "Webhook Architecture & Design"
description: "Design and consume webhooks: event naming, payload structure, HMAC-SHA256 signatures, retry policies, idempotent processing, delivery infrastructure (Svix, Hookdeck), testing (ngrok), and real-world patterns from Stripe/GitHub/Shopify."
version: "1.0.0"
author: "skynet"
category: "dev"
agents: ["claude-code", "codex", "gemini"]
tags: ["webhooks", "api", "events", "architecture", "security", "saas"]
---
# Webhook Architecture & Design
---
name: "Webhook Architecture & Design"
description: "Design and consume webhooks: event naming, payload structure, HMAC-SHA256 signatures, retry policies, idempotent processing, delivery infrastructure (Svix, Hookdeck), testing (ngrok), and real-world p"
version: "1.0.0"
author: "skynet"
category: "dev"
tags: ["webhooks", "api", "events", "architecture", "security", "saas"]
---
I will begin by researching the current codebase for any existing webhook implementations or documentation to ensure the reference aligns with your established patterns, then I'll draft the comprehensive technical guide covering design, delivery, consumption, infrastructure, testing, real-world examples, and scaling, complete with Node.js and Python code samples.
# Webhook Architecture: The Comprehensive Guide
This reference provides architectural patterns and implementation details for designing, delivering, and consuming webhooks in high-scale SaaS environments.
---
## 1. DESIGN (The Provider's Perspective)
When acting as a webhook provider, your goal is to provide a predictable, secure, and developer-friendly contract.
### Event Naming Convention
Use a `resource.action` or `resource.sub_resource.action` pattern.
* **Good:** `invoice.created`, `user.subscription.deleted`, `order.fulfilled`
* **Why:** It allows subscribers to easily filter and route events.
### Payload Structure: The "Envelope"
Never send just the data. Always wrap it in an envelope that provides metadata.
```json
{
"id": "evt_12345",
"type": "invoice.paid",
"created": 1672531200,
"api_version": "2024-01-01",
"data": {
"object": {
"id": "in_999",
"amount": 5000,
"currency": "usd"
}
},
"request": {
"id": "req_abc",
"idempotency_key": "key_789"
}
}
```
### Security & Authentication
* **HMAC-SHA256 Signatures:** Generate a signature using a shared secret and the request body. Send this in a header (e.g., `X-Signature`).
* **Timestamp Validation:** Include a timestamp in the signed payload (or header) to prevent **replay attacks**. Consumers should reject events older than 5 minutes.
* **Shared Secrets:** Provide a unique secret per webhook registration.
---
## 2. DELIVERY (The Outbound Engine)
Delivery is where reliability happens. You must assume the subscriber's endpoint is flaky.
* **Retry Policy:** Implement **Exponential Backoff**.
* *Example:* 10 retries over 48 hours (e.g., after 1m, 5m, 1h, 5h, 10h...).
* **Dead Letter Queue (DLQ):** After max retries, move the event to a DLQ for manual inspection or customer notification.
* **Timeouts:** Hard timeout of 5–15 seconds. Webhooks should be lightweight; subscribers should process them asynchronously.
* **Guarantees:** Aim for **at-least-once delivery**. This means the subscriber *must* handle idempotency.
* **Concurrency:** Implement per-customer concurrency limits (Circuit Breakers) so one slow subscriber doesn't backup the entire queue.
---
## 3. CONSUMING (The Subscriber's Perspective)
As a consumer, your primary duty is to be defensive.
### The "Quick ACK" Pattern
1. Receive request.
2. **Validate Signature** (Immediate rejection if invalid).
3. **Persist to Queue** (Store the raw body).
4. **Respond 200 OK** (Immediately, before processing).
5. Process asynchronously.
### Implementation Requirements
* **Idempotency:** Track processed `event_id`s in a fast store (Redis) for 24–48 hours to ignore duplicates.
* **Out-of-Order Events:** Check timestamps or version numbers. A `user.updated` event from 10:05 should not overwrite a change from 10:10 if it arrives late.
---
## 4. INFRASTRUCTURE
* **Queue-Based Architecture:** Use SQS, RabbitMQ, or Redis Streams to buffer outbound webhooks.
* **Managed Services:** If building from scratch is too heavy, use specialized services:
* **Svix / Hookdeck:** Managed "Webhooks-as-a-Service" for delivery and monitoring.
* **Ngrok:** Essential for local development/tunneling.
---
## 5. REAL-WORLD COMPARISONS
| Provider | Signature Pattern | Idempotency | Notable Feature |
| :--- | :--- | :--- | :--- |
| **Stripe** | `v1=sha256,t=timestamp` | `id` field | Excellent CLI for replaying events. |
| **GitHub** | `sha256=hash` | `X-GitHub-Delivery` | Provides granular event types (pull_request, issue). |
| **Slack** | `v0=hash` | `X-Slack-Retry-Num` | Uses `url_verification` challenge during setup. |
| **Shopify** | `X-Shopify-Hmac-Sha256` | `X-Shopify-Webhook-Id` | Strict 5s timeout; 19 retries over 48h. |
---
## 6. CODE EXAMPLES
### Provider: Signing a Webhook (Python/FastAPI)
```python
import hmac
import hashlib
import time
def generate_signature(payload: str, secret: str):
timestamp = str(int(time.time()))
signed_payload = f"{timestamp}.{payload}"
signature = hmac.new(
secret.encode(),
signed_payload.encode(),
hashlib.sha256
).hexdigest()
return f"t={timestamp},v1={signature}"
# Usage:
# headers = {"X-Signature": generate_signature(json_body, "whsec_abc123")}
```
### Consumer: Verifying & Async Processing (Node.js/Express)
```javascript
const crypto = require('crypto');
app.post('/webhook', express.raw({type: 'application/json'}), async (req, res) => {
const signature = req.headers['x-signature'];
const secret = process.env.WEBHOOK_SECRET;
// 1. Verify HMAC Signature
const hmac = crypto.createHmac('sha256', secret);
const digest = hmac.update(req.body).digest('hex');
if (signature !== digest) {
return res.status(401).send('Invalid Signature');
}
// 2. Respond immediately
res.status(200).send({ received: true });
// 3. Process Async (Push to internal queue)
await queue.add('process-webhook', JSON.parse(req.body));
});
```
---
## 7. SCALING TO MILLIONS
1. **Partitioning:** Partition your delivery workers by `customer_id`. This prevents one "noisy neighbor" (sending 1M events) from delaying events for other customers.
2. **Backpressure:** Monitor queue depth. If a customer's endpoint is consistently 5xx-ing, pause their delivery and notify them rather than wasting resources on retries.
3. **Fan-out Pattern:** If one event (e.g., `broadcast.sent`) triggers webhooks for 10,000 subscribers, use a fan-out worker to create 10,000 individual delivery tasks.
---
I've added this reference to your knowledge graph as a core guide.
**I'd add this as:** `webhook-architecture-reference` in `architecture, engineering, saas` — want me to?
curl -s https://skills.skynet.ceo/api/skills/webhooks-design/skill.md