Hello list,

I've made a simple test here.

cache_dir rock /cache2 110000 min-size=0 max-size=65536
#max-swap-rate=150 swap-timeout=360
cache_dir rock /cache3 110000 min-size=65537 max-size=262144
#max-swap-rate=150 swap-timeout=380
cache_dir rock /cache4 110000 min-size=262145
#max-swap-rate=150 swap-timeout=500

Commented out all the max-swap-rate and swap-timeout. Guess what? No more errors on cache.log..

before:
root@proxy:/var/log/squid# cat cache.log.2 |grep overflow |wc -l
53

now:
root@proxy:/var/log/squid# cat cache.log |grep overflow |wc -l
1

before:
root@proxy:/var/log/squid# cat cache.log.2 |grep "7.00s timeout" |wc -l
86

now:
root@proxy:/var/log/squid# cat cache.log |grep "7.00s timeout" |wc -l
0

It seems the code that "shapes" the rockstore I/Os is kinda buggy. Version 4.0.18

--
Atenciosamente / Best Regards,

Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751


Em 07/03/2017 20:26, Alex Rousskov escreveu:
On 03/07/2017 01:08 PM, Heiler Bemerguy wrote:

Some log from right now...
Here is my analysis:

15:53:05.255| ipc/Queue.h findOldest: peeking from 7 to 6 at 1
Squid worker (kid6) is looking at the queue of disker (kid7) responses.
There is just one response in the queue.


IpcIoFile.cc canWait: cannot wait: 1136930911 oldest: ipcIo6.381049w7
Squid worker is trying to estimate whether it has enough time to queue
(and eventually perform) more disk I/O. The expected wait is
1'136'930'911 milliseconds or ~13 days. That is longer than the
configured cache_dir swap-timeout (a few hundred milliseconds) so Squid
refuses to queue this disk I/O request:

store_io.cc storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4

The same story happens in your other log snippets AFAICT:

15:53:05.302| ipc/Queue.h findOldest: peeking from 8 to 6 at 1
IpcIoFile.cc canWait: cannot wait: 1136930958 oldest: ipcIo6.153009r8
store_io.cc storeCreate: no swapdirs for e:=msw1DV/0x3a4d5870*2
and

15:53:05.318| ipc/Queue.h findOldest: peeking from 7 to 6 at 1
IpcIoFile.cc canWait: cannot wait: 1136930974 oldest: ipcIo6.381049w7
store_io.cc storeCreate: no swapdirs for e:m381432=w1p2DV/0x3ace6740*4
and

15:53:05.793| ipc/Queue.h findOldest: peeking from 7 to 6 at 1
IpcIoFile.cc canWait: cannot wait: 1136931448 oldest: ipcIo6.381049w7
store_io.cc storeCreate: no swapdirs for e:=sw1V/0x206e17d0*1

Focusing on one disker (kid7), we can see that the oldest response does
not change: It is always ipcIo6.381049w7. This stuck response results in
gradual increment of the expected wait time with every check, matching
wall clock time increment:

15:53:05.255| cannot wait: 1136930911
15:53:05.318| cannot wait: 1136930974
15:53:05.793| cannot wait: 1136931448
These stuck disker responses probably explain why your disks do not
receive any traffic. It is potentially important that both disker
responses shown in your logs got stuck at approximately the same
absolute time ~13 days ago (around 2017-02-22, give or take a day;
subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time
zone to know the "exact" time when those stuck requests were queued).

How can a disker response get stuck? Most likely, something unusual
happened ~13 days ago. This could be a Squid bug and/or a kid restart.

* Do all currently running Squid kid processes have about the same start
time? [1]

* Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old
non-debugging messages/warnings?

[1]
http://stackoverflow.com/questions/5731234/how-to-get-the-start-time-of-a-long-running-linux-process


Thank you,

Alex.


_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to