I don't have the code in front on me, but if I remember correctly matching keys for the internal DLRs are created even before the SMS is sent to the SMSc. Returned DLRs are then just matched against those keys.
BR, Nikos ----- Original Message ----- From: Alejandro Guerrieri To: Nikos Balkanas Cc: users@kannel.org Sent: Thursday, April 30, 2009 12:47 AM Subject: Re: Possible race condition with dlr-mysql Well, the DB latency makes it worse, but it is theoretically possible (despite quite unlikely) to have it on internal dlr's as well, isn't it? Regards, Alejandro 2009/4/29 Nikos Balkanas <nbalka...@gmail.com> Definitely. DLRs are not synchronous and therefore a little extra delay wouldn't hurt. To make it better I would suggest the delay only in the case of DB storage for DLRs. Nikos ----- Original Message ----- From: Alejandro Guerrieri To: users@kannel.org Sent: Thursday, April 30, 2009 12:28 AM Subject: Possible race condition with dlr-mysql Hi, I'm doing some tests with DLR's (mysql storage) and I've come across a weird problem. I'm using 2 Kannel servers, each one having an SMPP with a carrier. Messages may come and go over either link, so maybe an MT goes from server #1 and a DLR comes back on server #2. To solve that issue, I'm using a central DB and the mysql storage for DLR's. The problem is, sometimes (about 1 in 5-6 messages) the DLR arrives before the row is inserted, so kannel ignores it and the record then remains untouched forever. This usually happens when the MT and the DLR are processed on different servers, though most of the time it just works (even when the MT and DLR are processed on different servers, the DLR is found, processed and deleted). Here's an example: Server #1: 2009-04-29 16:44:45 [14318] [7] DEBUG: DLR[mysql]: Adding DLR smsc=my-smsc, ts=5073a07e, src=OOOO, dst=XXXXXXXXXXX, mask=31, boxc= 2009-04-29 16:44:45 [14318] [7] DEBUG: sql: INSERT INTO dlr (smsc, ts, source, destination, service, url, mask, boxc, status) VALUES ('my-smsc', '5073a07e', 'OOOO', 'XXXXXXXXXXX', 'kannel', 'http://my-host-name/dlr?id=f59d4249-65d8-4969-a2d9-636c881b9de7&code=%d&scode=%B', '31', '', '0'); Server #2: 2009-04-29 16:44:45 [8395] [9] DEBUG: DLR[mysql]: Looking for DLR smsc=my-smsc, ts=5073a07e, dst=XXXXXXXXXXX, type=2 2009-04-29 16:44:45 [8395] [9] DEBUG: sql: SELECT mask, service, url, source, destination, boxc FROM dlr WHERE smsc='my-smsc' AND ts='5073a07e'; 2009-04-29 16:44:45 [8395] [9] ERROR: SMPP[my-smsc]: got DLR but could not find message or was not interested in it id<5073a07e> dst<XXXXXXXXXXX> I think that's because there's a possible race condition inherent on SQL latency: The dlr only could be inserted after the submit_sm_resp is received, but perhaps the smsc starts delivering the message on a separate thread right after receiving the submit_sm. Add some SQL latency and there's a possible race condition: 1. Kannel sends a submit_sm 2. SMSC starts delivering the message on another thread 3. SMSC starts delivering the submit_sm_resp 4. The SMSC ends delivering the DLR. 5. Kannel receives the DLR and searches for it on the DB. Not found - DLR is ignored. 6. The SMSC ends delivering the submit_sm_resp 7. Kannel parses the receipted_message_id and inserts the DLR. 8. The DLR row is not searched again and remains forever on the queue. A possible solution would be to implement a (configurable/disabled by default) retry mechanism for missing DLR's. For example, retrying one or two times after a few milliseconds if the dlr is not found. Opinions? Insights? Regards, Alejandro