> On 2014-02-21 06:10, Simon Beale wrote:
>> I've got a problem at the moment with our general squid proxies where
>> occasionally requests take a long time that shouldn't do. (i.e. 5+
>> seconds
>> or timeout, instead of milliseconds).
>>
>> This is most common on our proxies doing 100 reqs/sec, but happens
>> overnight too when they're running at 10 reqs/sec. I've got this
>> happening
>> with both v3.4.2 and also with a box I've downgraded back to v3.1.10.
>> For
>> v3.4.2, it's happening in both multiple worker and single worker modes.

As a follow up, we've narrowed this down to the internal DNS resolver.
When I deploy a 3.4.2 (which is what we're running elsewhere) that's been
recompiled with "--disable-internal-dns", the problem completely goes
away.

> What sort of CPU loading do you have at ~100req/sec?
>   is that at or near your local installations req/sec capacity?

For the box running with a single worker, it consumes 50% of one core at
100 req/sec.
For the boxes running with 9 workers, each worker consumes 5% of a core at
the same rate.

>> The test is not reproducible, sadly, but I've got a cronjob running on
>> localhost on these boxes testing access times to various URLs covering:
>> HTTPS, non-HTTPS static content, using IP not hostname over both HTTP
>> and
>> HTTPS, and a URL on the same vlan as the proxies. All of these test
>> cases
>> have it happen occasionally, but not repeatedly/reliably.
>
> Some ideas:
>   * DNS lookup delays ?

Yeah, when I enabled the dns resolution time logging in squid, that became
apparent.

Quite why the internal dns resolver shows this, but the external one
doesn't, I don't know. The DNS server query logs show both DNS servers in
/etc/resolv.conf getting the request in turn and answering it (though 5
seconds apart). It's happening for us in multiple datacentres, so is
unlikely to be port errors or internal packet loss.

It's only(/mostly?) apparent on our squid servers that do desktop
proxying, so do lots of DNS requests to everywhere; the squid servers that
handle just our datacentre servers don't show this problem, but only
really go to about 40 hosts in total.

Thanks

Simon

Reply via email to