On 10 February 2010 13:53, Poul-Henning Kamp <[email protected]> wrote: > In message <[email protected]>, Paul > Wright writes: > >>Thanks for the explanation of what's going on. Looking at those >>tickets there are suggestions to try the poll waiter which we're >>already using - are there any further tests we could try to help >>narrow down this issue? I'm happy to assist trying out patches. > > I can see three ways to nail this issue: > > 1. Catch a tcpdump, when it happens, showing that the client side > did close, and Solaris (incorrectly) returns EBADF. > > 2. Catch a ktrace/systrace/dtrace, when it happens, that show > that Varnish incorrectly closes the fd. > > 3. Setup some synthetic test to show that solaris returns EBADF > when it shouldn't > > If either of those are in your reach, by all means go for it...
I've had a go at 1.) and have two verbose `snoop` traces during child panics. I used sp.client from the backtrace to find out the port number and then looked at just matching packets. From my (limited) Wireshark comprehension they show the client establishing a connection to Varnish, issue a GET, receive the response (200 OK). Then the client sends a RST packet, from there the connection disappears. Would this cause the child to panic? I can't post these traces publicly but are there any other details that would help? I've been racking my brains to think if there is any special in our setup and the only thing that springs to mind is the firewall. We have an OpenBSD firewall using PF to redirect HTTP traffic from the public IP address to the internal web servers which has worked without issue for a number of years. During testing the only firewall change I've made redirects this traffic to Varnish instead. Paul. _______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
