Em 07/03/2017 20:26, Alex Rousskov escreveu:
These stuck disker responses probably explain why your disks do not
receive any traffic. It is potentially important that both disker
responses shown in your logs got stuck at approximately the same
absolute time ~13 days ago (around 2017-02-22, give or take a day;
subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time
zone to know the "exact" time when those stuck requests were queued).

How can a disker response get stuck? Most likely, something unusual
happened ~13 days ago. This could be a Squid bug and/or a kid restart.

* Do all currently running Squid kid processes have about the same start
time? [1]

* Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old
non-debugging messages/warnings?

I searched the log files from those days, nothing unusual, "grep" returns nothing for ipcIo6.381049w7 or ipcIo6.153009r8.

On that day I couldn't verify if the kids were still with the same uptime, I've reformatted those /cache2 /cache3 and /cache4 partitions and started fresh with squid -z, but looking at the PS right now, I feel I can answer that question:

root@proxy:~# ps auxw |grep squid-
proxy 10225 0.0 0.0 13964224 21708 ? S Mar10 0:10 (squid-coord-10) -s proxy 10226 0.1 12.5 14737524 8268056 ? S Mar10 7:14 (squid-disk-9) -s proxy 10227 0.0 11.6 14737524 7686564 ? S Mar10 3:08 (squid-disk-8) -s proxy 10228 0.1 14.9 14737540 9863652 ? S Mar10 7:30 (squid-disk-7) -s proxy 18348 3.5 10.3 17157560 6859904 ? S Mar13 48:44 (squid-6) -s proxy 18604 2.8 9.0 16903948 5977728 ? S Mar13 37:28 (squid-4) -s proxy 18637 1.7 10.8 16836872 7163392 ? R Mar13 23:03 (squid-1) -s proxy 20831 15.3 10.3 17226652 6838372 ? S 08:50 39:51 (squid-2) -s proxy 21189 5.3 2.8 16538064 1871788 ? S 12:29 2:12 (squid-5) -s proxy 21214 3.8 1.5 16448972 1012720 ? S 12:43 1:03 (squid-3) -s

Diskers aren't dying but workers are, a lot.. with that "assertion failed: client_side_reply.cc:1167: http->storeEntry()->objectLen() >= headers_sz" thing.

Viewing DF and IOSTAT, it seems right now /cache3 isn't being accessed anymore. (I think it is the disk-8 above, look at the CPU time usage..)

Another weird thing: lots of timeouts and overflows are happening on non-active hours.. From 0h to 7h we have like 1-2% of the clients we usually have from 8h to 17h.. (commercial time)

2017/03/14 00:26:50 kid3| WARNING: abandoning 23 /cache4/rock I/Os after at least 7.00s timeout 2017/03/14 00:26:53 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/03/14 02:14:48 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.68259w9 2017/03/14 06:33:43 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.55919w9 2017/03/14 06:57:53 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.58130w9

This cache4 partition is where huge files would be stored:
maximum_object_size 4 GB
cache_dir rock /cache2 110000 min-size=0 max-size=65536 max-swap-rate=150 swap-timeout=360 cache_dir rock /cache3 110000 min-size=65537 max-size=262144 max-swap-rate=150 swap-timeout=380 cache_dir rock /cache4 110000 min-size=262145 max-swap-rate=150 swap-timeout=500

Still don't know how /cache3 stopped and /cache4 is still active, even with all those warnings and errors.. :/

--
Atenciosamente / Best Regards,

Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751

_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to