Greetings, I have just taken charge of a squid machine that has been showing some strange behaviour for the last few weeks. For one thing, cpu usage on average is above 80% which seems abnormally high, even taking into account the high traffic (about 2000 hits per minute).
Secondly, all traffic randomly stops for anywhere between a few seconds and 20 minutes. Restarting squid during this time brings it back up, but this isn't feasable when we have to do it a few times a day. Even if I don't restart manually, it becomes available again by itself, but the downtime is too costly. Now for some details. The hardware is as follows: 2.7GHz x 2 Xeon CPUs 2GB RAM 36GB x 2, 10,000 rpm SCSI drives One drive is used for the OS (Fedora Core 2, default kernel) while the other is used for cache (32GB cache size, reiserfs with notail and notime options). Next, until a few days ago, the machine was running Squid 2.5-STABLE6 (without patches). It was configured to use diskd and various options which I don't have availabe right at this minute. Quite a few OS settings recommended for Squid are present, such as port range (1024-65000), file descriptors (32768), tcp_max_syn_backlog (8192) etc. I tried switching to 2.5-STABLE5 with the 3 major patches since we have this setup on another proxy, but got the same results. This morning, I upgraded to the latest 2.5-STABLE6 version with all the current patches and set the cache type to aufs (simple 'squid -z', not a reformat of the underlying filesystem), but it solved neither the CPU usage issue, nor the random drops. Nothing unusual shows up in cache_log, access_log or any of the system logs. Our network monitor also confirms that this isn't a network issue and that requests are going to the proxy during the outages. Unless someone can identify the problem here, I'll consider installing Slackware and a custom kernel to rule out that it's an OS issue. Any help will be appreciated. Thanks, -- A. Sajjad Zaidi GnuPG Key ID: 0xD7AD0E13