On Mon, Jan 5, 2015 at 2:10 PM, Jeremy Evans <[email protected]> wrote: > Note that just because multiple children have the same file descriptor > integer for the connection as the parent did before disconnect, does not > mean that the socket is shared. Most operating systems will use the lowest > unused integer for a new file descriptor, so if the parent closes the > connection, multiple child are forked, and each child opens a new > connection, the same integer will be used for all children, even though the > file descriptors themselves are independent.
Oh, good to know. Thank you. So this should not be a proof for shared file descriptors. Guess I should look for sudden disconnecting instead of why somehow the FDs were shared. > My first guess is somewhere in your app (or in one of the your app's > dependencies), something is calling fork, then calling exit (instead of > Process.exit!). Sometimes this is done to get cheap backgrounds jobs. In > ruby, this will cause all of the sockets in the child process to be > disconnected. Since the parent shares the database connection sockets with > the child, this also disconnects the connection in the parent (this parent > would be the unicorn worker process, not the unicorn master process). I was excited when reading this, and I just searched /\bfork\b/ through all the dependencies. No luck though :( All the calls to fork were in tests. I believe they were not called on production. Guess the lesson here is I should be very careful with fork and at_exit. I was not aware of this behaviour. > If that isn't the case, unless you are able to come up with a reliable way > to reproduce the error, it's going to be pretty hard to debug. As a shot in > the dark, you could try using pg 0.17.1 instead of pg 0.18.0 and see if that > has any effect. Just downgraded to pg 0.17.1 and no luck either :( Too bad I can't downgrade Rails in order to test this... It's pretty reliable to reproduce this *on production*, just restart the server and I shall see them happening. Never ever successfully reproduced this on local nor staging server though. Guess I can only debug this on production.. Maybe it might be a good idea to tell Sequel to reconnect while debugging this, so that at least it's not giving people 500 error pages while debugging? The other thing I am wondering, we're also using ActiveRecord, why we didn't see errors for them? > Thanks, > Jeremy Thank you! -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
