Hi Everyone, Yes I have experienced his before too.
It also occurs if you are running separate TX/RX pair configurations on the same server for the same binds. (One with just receive-port and the other with just port). As has been mentioned some sort of use of the store system would be great, because as someone suggested the database could lose connection during operation and these DLR's would be lost. I think an 'upper level' (not in the specific SMSC, possibly bbox code) retry system would be a solution. With pending retries to be put into store. Obviously configurable though, as this might impact performance somewhat for the sake of 'consistency', and some users might not feel it's worth it. Obviously lots to think about in terms of max retries, etc etc ;) Any thoughts on this? Thanks Donald 2009/4/29 Alejandro Guerrieri <alejandro.guerri...@gmail.com>: > Well, the DB latency makes it worse, but it is theoretically possible > (despite quite unlikely) to have it on internal dlr's as well, isn't it? > > Regards, > Alejandro > 2009/4/29 Nikos Balkanas <nbalka...@gmail.com> >> >> Definitely. DLRs are not synchronous and therefore a little extra delay >> wouldn't hurt. To make it better I would suggest the delay only in the case >> of DB storage for DLRs. >> >> Nikos >> >> ----- Original Message ----- >> From: Alejandro Guerrieri >> To: users@kannel.org >> Sent: Thursday, April 30, 2009 12:28 AM >> Subject: Possible race condition with dlr-mysql >> Hi, >> I'm doing some tests with DLR's (mysql storage) and I've come across a >> weird problem. >> I'm using 2 Kannel servers, each one having an SMPP with a carrier. >> Messages may come and go over either link, so maybe an MT goes from server >> #1 and a DLR comes back on server #2. To solve that issue, I'm using a >> central DB and the mysql storage for DLR's. >> The problem is, sometimes (about 1 in 5-6 messages) the DLR arrives before >> the row is inserted, so kannel ignores it and the record then remains >> untouched forever. This usually happens when the MT and the DLR are >> processed on different servers, though most of the time it just works (even >> when the MT and DLR are processed on different servers, the DLR is found, >> processed and deleted). >> Here's an example: >> >> Server #1: >> >> 2009-04-29 16:44:45 [14318] [7] DEBUG: DLR[mysql]: Adding DLR >> smsc=my-smsc, ts=5073a07e, src=OOOO, dst=XXXXXXXXXXX, mask=31, boxc= >> >> 2009-04-29 16:44:45 [14318] [7] DEBUG: sql: INSERT INTO dlr (smsc, ts, >> source, destination, service, url, mask, boxc, status) VALUES ('my-smsc', >> '5073a07e', 'OOOO', 'XXXXXXXXXXX', 'kannel', >> 'http://my-host-name/dlr?id=f59d4249-65d8-4969-a2d9-636c881b9de7&code=%d&scode=%B', >> '31', '', '0'); >> >> Server #2: >> >> 2009-04-29 16:44:45 [8395] [9] DEBUG: DLR[mysql]: Looking for DLR >> smsc=my-smsc, ts=5073a07e, dst=XXXXXXXXXXX, type=2 >> >> 2009-04-29 16:44:45 [8395] [9] DEBUG: sql: SELECT mask, service, url, >> source, destination, boxc FROM dlr WHERE smsc='my-smsc' AND ts='5073a07e'; >> >> 2009-04-29 16:44:45 [8395] [9] ERROR: SMPP[my-smsc]: got DLR but could not >> find message or was not interested in it id<5073a07e> dst<XXXXXXXXXXX> >> >> I think that's because there's a possible race condition inherent on SQL >> latency: The dlr only could be inserted after the submit_sm_resp is >> received, but perhaps the smsc starts delivering the message on a separate >> thread right after receiving the submit_sm. Add some SQL latency and there's >> a possible race condition: >> >> 1. Kannel sends a submit_sm >> >> 2. SMSC starts delivering the message on another thread >> >> 3. SMSC starts delivering the submit_sm_resp >> >> 4. The SMSC ends delivering the DLR. >> >> 5. Kannel receives the DLR and searches for it on the DB. Not found - DLR >> is ignored. >> >> 6. The SMSC ends delivering the submit_sm_resp >> >> 7. Kannel parses the receipted_message_id and inserts the DLR. >> >> 8. The DLR row is not searched again and remains forever on the queue. >> >> A possible solution would be to implement a (configurable/disabled by >> default) retry mechanism for missing DLR's. For example, retrying one or two >> times after a few milliseconds if the dlr is not found. >> >> Opinions? Insights? >> >> Regards, >> >> Alejandro > > -- Donald Jackson http://www.ddj.co.za/ donaldjster(a)gmail.com