On Mon, 2012-03-19 at 23:26 -0600, Alex Rousskov wrote: > On 03/18/2012 11:07 PM, Amos Jeffries wrote: > > On 13/03/2012 10:14 p.m., Alexander Komyagin wrote: > >> Hello. We're now trying to give a chance to the new Squid 3.2 on our > >> server, mainly because of it's SMP feature. But our tests are showing > >> that 3.2 (3.2.0.14 and 3.2.0.16 were tested) performance is noticeably > >> lower than 3.1 (3.1.15). > >> > >> We're using "httperf --client=0/1 --hog --server x.x.x.x --rate=100 > >> --num-conns=1000 --timeout=5 --num-calls=10" for testing. And for 3.2 > >> it's showing about 140 client timeouts (from 1000), while for 3.1 there > >> are no errors at all. > >> > >> Different workers numbers were checked (1,2,4), but results are still > >> the same -- completely unchanged -- which is rather _strange_, since as > >> far as I know (by squid website and source browsing), in our > >> configuration workers shall NOT share anything but one listening socket > >> (y.y.y.y:3128). > >> More than that, CPU use is _only_ about 20% per worker (2 CPU's - 2 > >> workers), vmstat reports no high memory consumption and iostat reports > >> 0% on iowait. > >> > >> Also according to logs, that clients timeouts are caused by some of new > >> connections not being spotted and accepted as well (not gone through > >> doAccept() routine from TcpAcceptor.cc). > > > > That is sounding very much like a kernel issue, or TCP accept rate > > limiting issue. > > Why would Squid v3.1 results differ from single-worker Squid v3.2 > results then? I assume both v3.1 and v3.2 use the same kernel and the > same OS configuration (including ulibc).
Actually, I don't know for sure. That's why I'm asking for help ;) I have even tried running squid 3.2 in non-daemon mode (pure single thread) - still no luck. > > > > Once a TCP connection is picked up by oldAccept() in the doAccept() > > sequence the results can be attributed to Squid, but if they never > > actually arrive there something is wrong at a deeper level down around > > the TCP stack or sockets libraries. > > > So from your results I conclude that one worker grabbed almost all the > > traffic and responded OK. But there is insufficient data about the > > interesting part of the traffic. What was going on there? which kid > > serviced it? Nope. Both workers are doing their job. Just not very well. > > I agree that making one of the workers super fast essentially > invalidates the test (unless you do the same to v3.1 too, but then you > just removed or scaled up the problem so it may not be the best test > direction anyway). > > > My recommendation is to use a single v3.2 worker for now and figure out > why a single v3.2 worker is dropping or ignoring connections when v3.1 > does not. There could be bugs in the new accept code that we need to > fix. Use no-daemon mode for both versions. > > I would start by trying to understand whether those connection errors > result from connections never seen by Squid or from connections accepted > but later ignored/forgotten by Squid. I do not know much about httperf, > but with just 1000 transactions, that should be relatively easy to > determine because you can record and match each transaction on both > sides of the test. > > > HTH, > > Alex. Alex, I have performed some more tests (including oprofile profiling, no-daemon mode, 1 worker, 2 workers, etc.). For now, it seems that the problem is highly related to RSBAC Networking which is enabled in our kernel. When I disabled it, the performance issue _has gone_. According to RSBAC logs, no single operation is denied. With RSBAC-Net enabled, 3.2 with 1, 2 workers and in no-daemon mode produces the problem. However, 3.1 works fine. Without RSBAC-Net everything is fine. By comparing oprofile results for 3.2 with and w/o RSBAC-Net, I can assume that RSBAC-Net subsystem performs some internal operations on list structures, which are indeed protected by locks - and this, in my point of view, may block simultaneous squid socket operations and affect performance. Also when I enable RSBAC full logging for squid process, 3.2 and 3.1 logs are different in two points: - 3.2 has some mystical IOCTL operations on TCP sockets, right after create, while 3.1 hasn't; - 3.1 produces BIND requests, while 3.2 doesn't. So far I agree that the problem probably resides in a socket level, but still wonder what the significant difference between 3.1 socket ops and 3.2? I'll check squid sources again in hope to find the answers. -- Best wishes, Alexander Komyagin
