It is strange, as many people are using the postgres, but is the first time for such a report ....
Have you manage to get some support from the postgres project about this (whne not using pgpool) ? Regards, Bogdan Bobby Smith wrote: > From the pgpool2 forums, and after digesting the log files between postgres, > pgpool2, and opensips, I was wondering if this is a possibility? It seems > only to happen cyclically every few hours as described below. > > > Also, I'll work on getting you the gdb trace when the child gets stuck on the > query again. For now, we have a workable "work around", but in general it's > not pretty and involves a quick scripted restart/failover on opensips. > > > Thanks, > > Bobby > > > Hi, > > From: Adrian Moisey <[EMAIL PROTECTED]> > Subject: [Pgpool-general] disconnects > Date: Thu, 03 Jan 2008 10:07:02 +0200 > > > We have a single postgres server that was under load. To try help with > > > the load I stuck pgpool in front of postgres (using connection caching) > > > > But every few hours everything seems to go pear shaped. pgpool seems to > > get "Connection reset by peer" and postgres gets some errors and kills > > > all its sessions then starts again. > > > > Here are the logs: > > > > Jan 3 08:00:29 ips01 pgpool: 2008-01-03 08:00:29 ERROR: pid 8117: > > pool_read: read failed (Connection reset by peer) > > > Jan 3 08:00:29 ips01 pgpool: 2008-01-03 08:00:29 ERROR: pid 8117: > > ProcessFrontendResponse: failed to read kind from frontend. fronend > > abnormally exited > > It seems that a client program call exit() before doing PQfinish(). > > Do you use persistent connection (pconnect() in PHP)? > > Regards, > -- > Yoshiyuki Asaba > [EMAIL PROTECTED > > > On Tue, Apr 21, 2009 at 4:16 AM, Bogdan-Andrei Iancu > <[email protected] <mailto:[email protected]>> wrote: > > Hi Bobby, > > you mean the current process is stucked in something.. I suspect > in PQgetResult() trying to read the result. Can you attach with > gdb to one of this blocked processes and get the backtrace (to see > where they are blocked) ? > > Regards, > Bogdan > > Bobby Smith wrote: > > Removing pgpool from the equation and simply doing a direct > postgres connection, it seems like the issue still occurs. If > opensips is idle on the avpops db for some time (sounds like > about 30 minutes), and you try to execute an arbitrary query > to that connection from the routing script, it just keeps > passing to the next child without returning data from the > query until it runs out of children to execute the process, > and so it just kind of "hangs". > > This is with maximum logging enabled. You can see where the > first process attempts to parse the message. When we get to > the db lookup, the task immediately goes to the next process > without actually having the first process stop, and it hangs > on the "SELECT" statement. I can at this point however > directly connect to my postgres backend and do the query manually. > > This repeats for all the SIP receiving processes, until none > are left. At this point, the proxy stops responding to any > messages. > > Is there some sort of connection timeout like parameter > involving avp_ops that I'm missing? It seems like I can > troubleshoot so far that a) if the proxy is never idle, this > doesn't seem to happen, and b) if the proxy is idle > (processing 0 messages), it happens after about 30 minutes and > requires a restart of opensips to function correctly. > > > > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: SIP Request: > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: method: <INVITE> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: uri: > <sip:[email protected]:5060;transport=udp> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: version: <SIP/2.0> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: flags=2 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.e8b09c62.0>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=2 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: this is the first via > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:receive_msg: After parse_msg... > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:receive_msg: preparing to run routing scripts... > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Starting > processing for method INVITE > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.2d03d153.0>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: parse_headers: this is the second via > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK506071629460-1207609494153>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_to: end of header reached, state=10 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_to: display={}, > ruri={sip:[email protected]:5060 > <http://sip:[email protected]:5060> > <http://sip:[email protected]:5060>} > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: <To> [38]; > uri=[sip:[email protected]:5060 > <http://sip:[email protected]:5060> > <http://sip:[email protected]:5060>] > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: to body > [<sip:[email protected]:5060 > <http://sip:[email protected]:5060> > <http://sip:[email protected]:5060>>#015#012] > > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: cseq <CSeq>: <1> <INVITE> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:maxfwd:is_maxfwd_present: value = 67 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Entering > rewrite-ruri block: ruri-user is +13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: regex > search turned up true!! > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: request > user: 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: this is the > phone number: 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Original > User is +13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Is the avp > phone_number still set? 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:avpops:ops_dbquery_avps: query [SELECT account_id FROM > customer.phone_numbers WHERE phone_number = '13129576830'] > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:db_postgres:db_postgres_submit_query: 0x76e170 > PQsendQuery(SELECT account_id FROM customer.phone_numbers > WHERE phone_number = '13129576830') > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:db_new_result: allocate 48 bytes for result set at > 0x776640 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: SIP Request: > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: method: <INVITE> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: uri: > <sip:[email protected]:5060;transport=udp> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: version: <SIP/2.0> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: flags=2 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.e8b09c62.0>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=2 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: this is the first via > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:receive_msg: After parse_msg... > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:receive_msg: preparing to run routing scripts... > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: Starting > processing for method INVITE > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.2d03d153.0>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: parse_headers: this is the second via > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK506071629460-1207609494153>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via: end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_to: end of header reached, state=10 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_to: display={}, > ruri={sip:[email protected]:5060 > <http://sip:[email protected]:5060> > <http://sip:[email protected]:5060>} > > > > > > On Fri, Apr 17, 2009 at 5:32 PM, Bobby Smith > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > An example of what happens -- on processing a call right > before a > database lookup (the select query below in the previous > message): > > 00:00:00 pgpool: opensips my_database localhost(46362) SELECT > postgres 445 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46383) SELECT > postgres 446 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 447 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 448 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 449 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 450 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 451 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46366) SELECT > postgres 452 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46371) SELECT > postgres 453 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 454 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46367) SELECT > postgres 455 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46372) SELECT > postgres 456 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46393) idle > postgres 457 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46391) SELECT > postgres 458 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46377) SELECT > postgres 459 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46378) SELECT > postgres 460 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46374) SELECT > postgres 461 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 462 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46387) SELECT > postgres 463 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 464 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46380) SELECT > postgres 465 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 466 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46385) SELECT > postgres 468 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46389) SELECT > postgres 469 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46361) SELECT > postgres 471 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 473 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 474 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 475 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 476 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 477 438 0 09:00 ? 00:00:00 pgpool: > wait for > connection request > postgres 478 438 0 09:00 ? 00:00:00 pgpool: > opensips > my_database localhost(46365) SELECT > > I have 32 "available" connections, 16 of which are occupied by > opensips children processes. On this one query, it looks like > "every" opensips process does the same processing to *all* the > pgpool connections. This is just for a single INVITE. I don't > know if the database lookup is failing or not, but it looks > like > the next child processes the message until none are left? > > It will kind of just float here in this state for some time > now -- > I've got a heartbeat script that can come back and clean > this up > (transparent to the endusers as we're in a failover setup), > but I > would like to know how to go about troubleshooting this > particular > type of issue. > > I know it's not the database dying or pgpool losing it's > connections, as I have logging enabled on both. > > Any suggestions on troubleshooting, or has anyone tried > this sort > of configuration before? > > Thanks much in advance. > > > On Thu, Apr 16, 2009 at 4:39 PM, Bobby Smith > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > All, > > We're currently (successfully) testing a configuration of > Opensips as a stateless proxy. I wanted to be able to > execute > an arbitrary database query to a database that's NOT the > opensips database, when I realized the power of the > avp_db_query( ) function in avpops. The database is a > backend > postgres database in an basic failover configuration > (16 max > connections cached, primary db, failover to secondary > db for > those connections). > > Currently, opensips has 16 children processes. In the > configuration, we have the following: > > modparam("avpops", "db_url", > "postgres://opensips:opens...@localhost:9999/my_database") > modparam("avpops", "avp_table", "") > > A few of the issues I'm concerned about or experiencing > with > the operation of it. > > a) Why is that second parameter necessary (avp_table)? It > feels like if I have to set this that it's always going to > keep open connections with the database, and indeed, it > is. When I ps -ef, it shows I have 16 pgpool > connections that are > idle to my_database. The problem that this is causing is > that, if for some reason one of the connections has an > issue > and dies or times out, this happens in the logs: > > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008 - > invalid > query, execution aborted > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008: > PGRES_FATAL_ERROR > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008: > server > closed the connection unexpectedly#012#011This probably > means > the server terminated abnormally#012#011before or while > processing the request.#012 > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:core:db_do_raw_query: error while storing result > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:avpops:db_query_avp: cannot do the query > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: Database > Error! No Lookup! > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: Query > Executed: Account Number is <null> > > The query I'm executing (from the config): > avp_db_query("SELECT value FROM schema.table WHERE value = > '$avp(s:string_a)'", "$avp(s:string_b)"); > > When these error messages pop up, pgpool still has active > database connections to postgres, but not held by > opensips. I > can use one of the pooled connections to connect to the > backend database and execute a query, but for some reason > opensips cannot. > > And, having a restart work (scripted), I occasionally > see the > following: > > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:db_postgres:db_postgres_new_connection: server closed > the connection unexpectedly#012#011This probably means the > server terminated abnormally#012#011before or while > processing > the request.#012 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:db_postgres:db_postgres_new_connection: cleaning up > 0x76e090=pkg_free() > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:db_do_init: could not add connection to the pool > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:avpops:avpops_db_init: cannot initialize database > connection > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:init_mod_child: failed to initializing module > avpops, rank -1 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:start_timer_processes: init_child failed for > timer proc > Apr 16 03:59:05 serinbound2 /sbin/opensips[23083]: > INFO:core:handle_sigs: child process 23105 exited normally, > status=255 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23083]: > INFO:core:handle_sigs: terminating due to SIGCHLD > Apr 16 03:59:05 serinbound2 /sbin/opensips[23107]: > INFO:core:sig_usr: signal 15 received > > At this point, the application crashes. > > Any suggestions or workarounds for this? More > specifically, > I'd like opensips to not have to grab the DB connection > if it > doesn't need it at that time performing a lookup (as > i'm not > really using avp's in the lookup, just to save the > results), > also, I'd like to see it not crash completely if it > loses that > connection. Instead, just send a error message back > and allow > me to insert a statement like: > > if(! what i expect the value of the > saved avp > from teh database query to be) > { > sl_send_reply("500", "Internal > Server > Error"); > exit; > } > > Thanks for your help, I know this was fairly detailed but > without much in terms of logging. > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Users mailing list > [email protected] <mailto:[email protected]> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > _______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
