fix: duplicate message_created webhooks for WhatsApp messages (#13523)

Some customers using WhatsApp inboxes with account-level webhooks were
reporting receiving duplicate `message_created` webhook deliveries for
every incoming message. Upon inspection, here's what we found

- Both payloads are identical.
- No errors appear in the application logs
- Webhook URL is only configured in one place. 

This meant, the system was sending the webhooks twice. For some context,
there's a know related issue... Meta's WhatsApp Business API can deliver
the same webhook notification multiple times for a single message. The
codebase already acknowledges this — there's a comment in
`IncomingMessageBaseService#process_messages` noting that "multiple
webhook events can be received against the same message due to
misconfigurations in the Meta business manager account." A deduplication
guard exists, but it doesn't actually work under concurrency.

### Rationale

The existing dedup was a three-step sequence: check Redis (`GET`), check
the database, then set a Redis flag (`SETEX`). Two Sidekiq workers
processing duplicate Meta webhooks simultaneously would both complete
the `GET` before either executed the `SETEX`, so both would proceed to
create a message. The `source_id` column has a non-unique index, so the
database wouldn't catch the duplicate either. Each message then
independently fires `after_create_commit`, dispatching two
`message_created` webhook events to the customer.

```
             Worker A                          Worker B
                │                                 │
                ▼                                 ▼
        Redis GET key ──► nil               Redis GET key ──► nil
                │                                 │
                │    ◄── both pass guard ──►      │
                │                                 │
                ▼                                 ▼
        Redis SETEX key                    Redis SETEX key
                │                                 │
                ▼                                 ▼
        BEGIN transaction               BEGIN transaction
        INSERT message                   INSERT message
        DELETE Redis key ◄─┐                      │
        COMMIT             │             DELETE Redis key
                           │             COMMIT
                           │                      │
                           └── key gone before ───┘
                              B's commit lands

                ▼                                 ▼
        after_create_commit              after_create_commit
        dispatch MESSAGE_CREATED         dispatch MESSAGE_CREATED
                │                                 │
                ▼                                 ▼
        WebhookJob ──► n8n               WebhookJob ──► n8n
                    (duplicate!)
```

There was a second, subtler problem visible in the diagram: the Redis
key was cleared *inside* the database transaction, before the
transaction committed. This opened a window where neither the Redis
check nor the database check would see the in-flight message.

The fix collapses the check-and-set into a single `SET NX EX` call,
which is atomic in Redis. The key is no longer eagerly cleared — it
expires naturally after 24 hours. The database lookup
(`find_message_by_source_id`) remains as a fallback for messages that
were created before the lock expired.

```
             Worker A                          Worker B
                │                                 │
                ▼                                 ▼
        Redis SET NX ──► OK              Redis SET NX ──► nil
                │                                 │
                ▼                                 ▼
        proceeds to create              returns early
        message normally                (lock already held)
```

### Implementation Notes

The lock logic is extracted into `Whatsapp::MessageDedupLock`, a small
class that wraps a single `Redis SET NX EX` call. This makes the
concurrency guarantee testable in isolation — the spec uses a
`CyclicBarrier` to race two threads against the same key and asserts
exactly one wins, without needing database writes,
`use_transactional_tests = false`, or monkey-patching.

Because the Redis lock now persists (instead of being cleared
mid-transaction), existing WhatsApp specs needed an `after` hook to
clean up `MESSAGE_SOURCE_KEY::*` keys between examples. Transactional
fixtures only roll back the database, not Redis.
This commit is contained in:
Shivam Mishra
2026-02-17 14:01:10 +05:30
committed by GitHub
parent fb2f5e1d42
commit 39243b9e71
6 changed files with 90 additions and 30 deletions

View File

@@ -28,19 +28,19 @@ class Whatsapp::IncomingMessageBaseService
# if the webhook event is a reaction or an ephermal message or an unsupported message.
return if unprocessable_message_type?(message_type)
# Multiple webhook event can be received against the same message due to misconfigurations in the Meta
# business manager account. While we have not found the core reason yet, the following line ensure that
# there are no duplicate messages created.
return if find_message_by_source_id(messages_data.first[:id]) || message_under_process?
# Multiple webhook events can be received for the same message due to
# misconfigurations in the Meta business manager account.
# We use an atomic Redis SET NX to prevent concurrent workers from both
# processing the same message simultaneously.
return if find_message_by_source_id(messages_data.first[:id])
return unless lock_message_source_id!
cache_message_source_id_in_redis
set_contact
return unless @contact
ActiveRecord::Base.transaction do
set_conversation
create_messages
clear_message_source_id_from_redis
end
end

View File

@@ -69,20 +69,9 @@ module Whatsapp::IncomingMessageServiceHelpers
@message = Message.find_by(source_id: source_id)
end
def message_under_process?
key = format(Redis::RedisKeys::MESSAGE_SOURCE_KEY, id: messages_data.first[:id])
Redis::Alfred.get(key)
end
def lock_message_source_id!
return false if messages_data.blank?
def cache_message_source_id_in_redis
return if messages_data.blank?
key = format(Redis::RedisKeys::MESSAGE_SOURCE_KEY, id: messages_data.first[:id])
::Redis::Alfred.setex(key, true)
end
def clear_message_source_id_from_redis
key = format(Redis::RedisKeys::MESSAGE_SOURCE_KEY, id: messages_data.first[:id])
::Redis::Alfred.delete(key)
Whatsapp::MessageDedupLock.new(messages_data.first[:id]).acquire!
end
end

View File

@@ -0,0 +1,19 @@
# Atomic dedup lock for WhatsApp incoming messages.
#
# Meta can deliver the same webhook event multiple times. This lock uses
# Redis SET NX EX to ensure only one worker processes a given source_id.
class Whatsapp::MessageDedupLock
KEY_PREFIX = Redis::RedisKeys::MESSAGE_SOURCE_KEY
DEFAULT_TTL = 1.day.to_i
def initialize(source_id, ttl: DEFAULT_TTL)
@key = format(KEY_PREFIX, id: source_id)
@ttl = ttl
end
# Returns true when the lock is acquired (caller should proceed).
# Returns false when another worker already holds the lock.
def acquire!
::Redis::Alfred.set(@key, true, nx: true, ex: @ttl)
end
end