Hi Chen-Che,

OpenSIPS is not crashing at all, but I see the dialog module is generating some queries with bogus values (for updating the dialog info) - I see the queries are setting state 4 which means the dialog just received the ACK for a 200 OK - so this is not related to BYE handling, but to ACK handling.

We will check the dlg_db_sync function in combination with postgres DB to see if we can replicate the issue.

Thanks and Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com


On 04/16/2013 10:26 AM, microx wrote:
Hi all,

In my environment, I use two active SIP proxy servers A and B that share one
backend database for high performance and failover. When active proxy server
A goes down, proxy server B will fetch the dialog information (originally
processed by the proxy server A) from the database into its local memory
when receiving a BYE. Specifically, proxy server B executes
avp_db_query("update dialog set caller_sock='udp:127.0.0.1:5060',
callee_sock='udp:127.0.0.1:5060' where caller_sock='udp:10.100.13.12:5060'")
and then calls exec_msg("/usr/local/sbin/opensipsctl fifo dlg_db_sync").
Note that udp:10.100.13.12:5060 is the processing socket of proxy server A.

With the above approach, proxy server B can handle BYE requests whose
associated dialogs are established by proxy server A in the beginning.
However, when a number of BYE requests reach proxy server B simultaneously
(60 BYES within one second), proxy server B may crash, resulting from an
ineligible query "Query: update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959". A part of the log is below.

Although the query is not my used avp_db_query, I don't know whether the
root cause is wrong setting of my config for failover. Any comment is
greatly appreciated.

Best regards,
Chen-Che

: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
PQsendQuery(update dialog set
state=4,timeout=1366102836,caller_cseq='0',callee_cseq='1',caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles='O1S1R1#<null>|',script_flags=0,flags=144
where dlg_id=2822250379813)
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720)
: DBG:db_postgres:free_query: PQclear(0x10f0720) result set
: DBG:core:db_free_rows: freeing 0 rows
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
: DBG:dialog:dialog_update_db: updating existing dialog 0x7f0dc4271750
: DBG:dialog:set_final_update_cols: DLG vals and profiles should not be
saved[0:0]
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
PQsendQuery(update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959)
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
failed: ERROR:  null value in column "caller_cseq" violates not-null
constraint#012DETAIL:  Failing row contains (2972999775959,
20-31706@192.168.8.222, sip:100000...@thcloud.com, 31706SIPpTag00Caller20,
sip:100000...@thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
0, 0, null, null, sip:100000039@192.168.8.222:6060,
sip:10005@192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959
: DBG:db_postgres:free_query: PQclear(0x10f1360) result set
: ERROR:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
Error: ERROR:  null value in column "caller_cseq" violates not-null
constraint#012DETAIL:  Failing row contains (2972999775959,
20-31706@192.168.8.222, sip:100000...@thcloud.com, 31706SIPpTag00Caller20,
sip:100000...@thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
0, 0, null, null, sip:100000039@192.168.8.222:6060,
sip:10005@192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959
: ERROR:core:db_do_update: error while submitting query
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
PQresultStatus(PGRES_FATAL_ERROR) PQgetResult((nil))
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010 - invalid
query, execution aborted
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010:
PGRES_FATAL_ERROR
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010:
: DBG:core:db_free_rows: freeing 0 rows
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
: WARNING:db_postgres:db_postgres_update: unexpected result returned
: ERROR:dialog:dialog_update_db: could not update database info
: DBG:core:pool_remove: connection still kept in the pool
Apr 16 13:44:46  /usr/sbin/opensips[18220]: last message repeated 2 times
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec095bc0
PQsendQuery(delete from location where expires<'2013-04-16 00:44:47' AND
expires!='1969-12-31 18:00:00')
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec095bc0
PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720)
: DBG:db_postgres:free_query: PQclear(0x10f0720) result set
: DBG:core:db_free_rows: freeing 0 rows
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
: DBG:core:pool_remove: removing connection from the pool
: DBG:db_postgres:db_postgres_free_connection: PQfinish(0x10e7d50)
: DBG:db_postgres:db_postgres_free_connection: pkg_free(0x7f0dec095cc0)
: DBG:tm:tm_shutdown: tm_shutdown : start
: DBG:tm:unlink_timer_lists: emptying DELETE list
: DBG:tm:tm_shutdown: emptying hash table
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42f86f8, callback type 4096, id 1
entered
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fcfb0, callback type 4096, id 1
entered
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fe9f8, callback type 4096, id 1
entered
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fa140, callback type 4096, id 1
entered
: DBG:tm:run_trans_callbacks: trans=0x7f0dc4272290, callback type 4096, id 1
entered
: DBG:tm:tm_shutdown: releasing timers
: DBG:tm:tm_shutdown: removing semaphores
: DBG:tm:tm_shutdown: destroying callback lists
: DBG:tm:tm_shutdown: tm_shutdown : done
Apr 16 13:45:46 qri-cdrc-application-node1 /usr/sbin/opensips[18220]:
CRITICAL:core:sig_alarm_abort: BUG - shutdown timeout triggered, dying...



--
View this message in context: 
http://opensips-open-sip-server.1449251.n2.nabble.com/crash-after-failover-tp7585845.html
Sent from the OpenSIPS - Users mailing list archive at Nabble.com.

_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to