[squid-users] cpu usage increases over time, squid performance declines

Mike Solomon Tue, 14 Feb 2006 22:31:44 -0800

Hi all,

I've been having some problems with squid in reverse proxy mode. Iwill preface this by saying that I realize I'm pushing squid to thelimits and I've been following the recent discussions onperformance. I'll try to be as clear as possible to expose the cruxof my problem, but please bear with me as the explanation is ratherlong.


Hardware:
DualCore Opeteron 270, 1800MHz
8GB RAM
2x73GB 10k RPM SCSI drives (no raid)

Software:

RedHat Enterprise Linux 4 (64-bit) (kernel 2.6.9-22.0.2.ELsmp) (someRedHat frankenkernel)

squid-2.5-patch12

added epoll patch (verified that epoll is getting used, had to tweakthe Makefile and a constant or two)./configure --with-maxfd=32768 --disable-poll --disable-select --enable-epoll --enable-async-io=48 --prefix=/usr/local/encap/squid-2.5.12-64bit-epoll


I have two AUFS cache dirs:
cache_dir aufs /cache1 24000 8 256
cache_dir aufs /mnt/drive2/cache2 24000 8 256

I've overridden the refresh:

refresh_pattern . 525600 20% 525600 override-expire override-lastmodignore-reload

This machine basically only serves a huge set of images, which do notchange over time, so refresh isn't really relevant.

Ok, so now for the good news. I have about 17,000 active connections(keep-alive is set to 30 seconds) doing 1700 requests/sec. Disk I/Ois negligible, as the cache is partitioned and all I/O is coming fromthe page cache. There is cpu to spare - a vmstat over 60 secondsreveal plenty of idle, 0 iowait, and an even split between user andsys cpu time. All this yields about 43Mbps of traffic on a 100MbpsNIC. Most files are ~4k.

This would be fantastic, but the machines "fall over" after severalhours. I have 4 machines, each configured identically. They last afew hours - they slowly consume more and more cpu, all in user space,until it starts affecting the median HTTP repsonse time. Thenthroughput drops precipitously.

If I interrupt and start a new process, everything returns to normal- much lower cpu load, response times drop, throughput returns. Thestore log reveals nothing - just the usual RELEASE messages. Runningan strace on a good and bad machine simultaneously doesn't revealanything obviously bad. There aren't any "queue congestion"messages, and I'm not seeing SO_FAIL messages (I upped the number ofthreads to 48 after I saw those messages reported with an earliermachine). Additionally, the cache manager general info doesn't showany particularly distressed metric.

Basically, something in squid is chewing user-space cpu over time,but I can't figure it out. I'm pretty new to squid, and I'm prettymuch stalled.

Turning down the persistent_request_timeout did not seem to have anyeffect as I recall.

request_timeout 30 seconds
persistent_request_timeout 30 seconds

I've attached a bit more data in case that can help.

I'd really appreciate any advice - squid is by far the best thingI've tried and if I can just make it stable, I'd be done. :-)


Thanks,

-Mike




Sample data from the cache manager:

File descriptor usage for squid:
        Maximum number of file descriptors:   32768
        Largest file desc currently in use:   18675
        Number of file desc currently in use: 17869
        Files queued for open:                   4
        Available number of file descriptors: 14895
        Reserved number of file descriptors:   100
        Store Disk files open:                  15
Internal Data Structures:
        885561 StoreEntries
          2916 StoreEntries with MemObjects
          2891 Hot Object Cache Items
        884513 on-disk objects


Results of cumulative strace:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
37.45   56.910200          13   4332990         3 futex
14.76   22.434863          11   1973577           epoll_ctl
14.30   21.724293          26    846843       550 write
13.02   19.786709          12   1642753    830428 read
11.45   17.393603           9   1920158           fcntl
  6.63   10.075447          12    829458           close
  1.15    1.751153         126     13919           epoll_wait
  0.82    1.244980           9    139200           accept
  0.30    0.461178           3    139200           getsockname
  0.08    0.114513           4     27840           gettimeofday
  0.01    0.019765          29       693       693 connect
  0.01    0.013483          19       693           socket
  0.01    0.008909          49       180           getdents64
  0.00    0.005713          13       430           stat
  0.00    0.005326           8       693           bind
  0.00    0.003456           5       693           getsockopt
  0.00    0.003367           5       693           setsockopt
  0.00    0.000936          20        48           open
  0.00    0.000347           3       103           lseek
  0.00    0.000241           5        48           fstat
  0.00    0.000110          22         5           brk
  0.00    0.000067           5        13           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00  151.958659              11870230    831674 total

[squid-users] cpu usage increases over time, squid performance declines

Reply via email to