This is exactly what happened to us and I should have been clearer. I wasn’t 
referring to the default Linux kernel settings causing the killing the 
connection; it was a network device between our application servers and the 
database server. It only affected certain applications as some were hit 
hundreds of times per second and would never be disconnected and the ones that 
would disconnect were only hit a few times per hour. I -believe- we just 
dropped the keepalive interval on both sides of the firewall below its idle 
timeout.  


On Wednesday, August 6, 2014 at 7:05 AM, Tony Devlin wrote:

> Eric,
>  
> The problem is a firewall that sits between the servers and the database.  It 
> is an idle session timeout of 30 minutes, so it is silently killing the 
> connection.  I have reached out to our Network Engineering department but 
> they are saying they can not change that idle session timeout, nor create a 
> special rule to allow this connection to bypass that rule.   
>  
> Currently, I setup a polling device that calls the applications URL every 20 
> minutes.  This causes the connection between the server and DB to refresh 
> it's idle timeout.  This is obviously a very hacky way to handle it, so I am 
> trying to look into AR and Oracle_Enhanced to see if they have some sort of 
> keepalive option for the database.  I thought it would work with the 
> reaping_frequency, but apparently that does not work out as I had expected 
> when you are not running in pools or a thread.  So I'm still on the lookout 
> for something to handle that.  
>  
>  
>  
>  
> On Wed, Aug 6, 2014 at 5:45 AM, Eric Wong <e...@80x24.org 
> (mailto:e...@80x24.org)> wrote:
> >  
> > Any update?  It looks like your DB driver is not using/respecting any
> > timeout at all[1].  It is bad to not have a timeout there.  There should
> > be a way to set a timeout so you can at least tell the user the DB
> > connection dropped or maybe get your app to disconnect+retry once.
> >  
> > A better looking strace would be something like:
> >  
> >     write(fd, ...); => success
> >     (poll|select|ppoll) syscall ...
> >     read(fd, ...); /* only if (poll|select|ppoll) was successful[2] */
> >  
> > This goes for configuring all connections/services for any app.
> >  
> > [1] or if it's relying on SO_RCVTIMEO socket option(rare), that's set
> >     way too high.  Any timeout set for any external connection should
> >     be lower than the unicorn (last-resort) timeout feature.
> >  
> > [2] any read() syscall after (poll|select|ppoll) should be non-blocking,
> >     because (poll|select|ppoll) may spuriously wakeup.  



Reply via email to