Re: [squid-users] Re: Squid monitoring, access report shows upto 5 % to 7 % cache usage

Amos Jeffries Tue, 06 Aug 2013 06:13:16 -0700

On 5/08/2013 11:14 p.m., babajaga wrote:

Sorry, Amos, not to waste too much time here for an off-topic issue, but
interesting matter anyways:

Okay. I am running out of time and this is slightly old info I'm basingall this on - so shall we finish up? measurements and testing is kind ofrequried to go further and demonstrate anything.

Disclaimer: some of what I "know" and say below may be complete FUD withmodern disks. I have not done any testing since 10-20GB was a widelyavailable storage device size and SSD layers on drives had not even beeninvented. Shop-talk with people doing testing more recently though tellsme that the basics are probably still completely valid even if thetricks added to solve problems are changing rapidly.The key take-away should be that Squids disk I/O pattern for smallobjects blows most of those new tricks into uselessness.

I ACK your remarks regarding disk controller activity. But, AFAIK, squid
does NOT directly access the disk controller for raw disk I/O, the FS is
always in-between instead. And, that means, that a (lot of) buffering can
occure, before real disk-I/O is done.


This depends on two factors:
1) there is RAM available for the buffering required.

-> The higher the traffic load the less memory is available to thesystem for this.


2) The OS has a chance of advance buffering.

-> Objects up to 64KB (often 4KB or 8 KB) can be completely loadedinto Squid I/O buffers in a single read(), and there is no way for theOS to identify which of the surrounding sectors/blocks are relatedobjects to the one just loaded (if it guesses and gets it wrong thingsgo even worse than not guessing at all).-> Also, remember AUFS is preferred for large (over-32KB) objects -the ones which will require multiple read()'s - and Rock best for small(under-32KB) objects. This OS buffering prediction is a significant partof the reason why.

  Which might even lead to spurious high
reponse times, when all of a sudden the OS decides, to really flush large
disk-buffers to disk.

Note that this will result in bursty disk I/O traffic pattern, withwaves of alternating high and low access speeds for disk accesses. Theaim with high performance is to flatten the low-speed troughs out asmuch as possible by raising them up to make a constant peak rate of I/O.

  In a good file system (or disk controller,
downstream), request-reordering should happen, to allow elevator-style head
movements.  Or merging file accesses, referencing the same disk blocks.

Exactly. And this is where Squid being partially *network* I/O eventdriven comes into play affecting the disk I/O pattern. Squid is managingN concurrent connections, each of those is potentially servicing adistinct *unique* client file fetch (well mostly, and when collapsedforwarding is ready fo Squid-3 it will be unique). Every I/O loop Squidcycles through all N in order and schedules a cross-sectional slice forany which are needing disk read/write. So each I/O cycle Squid deliversat most one read (HIT/MISS send to client) and one write (MISS receivedfrom server) for any given file, with up to N possibly vastly separatefiles on disk being accessed.The logics doing that elevator calculation are therefore *not* facedwith a single set of file operations in one area. But with across-sectional read/write over potentially the entire disk. At most itcan reorder those into elevator up/down cross section over the disk. Butin passing those completion events back to Squid it triggers another I/Ocycle for Squid over the network sockets, and thus another sweep overthe entire disk pace. Worst-case (and best) the spindle heads aresweeping the platter from end-to-end reading everything needed1-cycle:1-sweep.


That is with _one_ cache_dir sitting on the spindle.

Now if you pay close attention to the elevator sweep there is a lot oftime spent scanning between areas of the disk and not so much doing I/O.To optimize around this effect and allow even more concurrent file readsSquid is load balancing between cache_dir where it places files. AFAIKthe theory is that one head can be seeking while another is doing itsI/O, for overall effect of having a more steady flow of bytes back toSquid after the FS software abstraction layer and raising those troughsagain to a smooth flow. Although that said, "theory" is not practice.Placing both cache_dir on the one disk the FS logics will of coursereorder and interleave the I/O for each cache_dir such the the diskbehaviour is a single sweep as for one cache_dir. BUT, as a result theseek lag and bursty nature of read() bytes returned is fully exposed toSquid - by the very mechanisms supposedly minimizing that. In turn thisreflects in the network I/O as bytes are relayed directly there by Squidand TCP gets a bursty peak/trough pattern appearing.

Additionally, and probably more importantly, that reordering of 2cache_dir on one disk spindle down to the behaviour of 1 cache_dir capsthe I/O limit for *both* of those cache_dir at the disk I/O threshold(after optimization). Whereas having them on separate spindles wouldallow each to have that full capacity and effectivly double the disk I/Othreshold (after optimization).

Why we say Rock can share with UFS/AUFS/diskd is that the I/O block sizebeing requested is larger so there are less disk sweep movements even ifmany files/blocks are being loaded concurrently. So loading a fewhundred objects in one Rock block of I/O, most of which will then getmemory-HIT speeds, is just as efficient as loading _one_ more file outof the UFS/AUFS/diskd cache_dir.

And all this should happen after squids activities are completed, but before
the real disk driver/controller starts its work.
BTW, I did some private measurements, not regarding response times because
of various types of cache_dirs, but regarding reponse times/disk thruput
because of various FS and options thereof. And found, that a "crippled" ext4
works best for me. Default journaling etc. in ext4 has a definite hit on
disk-I/O. Giving up some safety features has a drastic positive influence.
Should be valid, for all types of cache_dirs, though.

I hazard a guess that if you go through them those "some" will all befeatures which involve doing some form of additional read/write to thedisk for each chunk of written bytes. Yes?Things such as file access timestamping, journal recording, checksumwriting, checksum validation post-write, dedup block registration, RAIDprotection, etc.


The logic behind that guess:

As mentioned above the I/O presented by Squid will be already slicedacross the network I/O streams and just needs reordering for the"elevator sweep" of quite a large number of base operations. Adding asecond sweep to perform all the followup operations, OR causing theelevator to jump slightly forward/back to do them mid-sweep (I *hope* nodisks do this anymore), will only harm the presented I/O sweep and slowdown the time before its completion can be notified to Squid. Worst-casehalving the I/O limits the disk can provide to the FS layer let aloneSquid, I imagine that worst-case is rare but some "drastic" amount ofdifference is fully expected.



Amos

Re: [squid-users] Re: Squid monitoring, access report shows upto 5 % to 7 % cache usage

Reply via email to