Hi, I'm running two varnish servers in production (ver 2.1.5). Both are using the same hardware and have the same amount of RAM (48GB). Last night one of the varnish servers stopped responding on port 80. Since we are using HAproxy in front of both varnish servers for load balancing this did not have much effect on our end users. The symptoms of the problem were either a client ( HAproxy, telnet) could not establish a layer 4 connection to varnish or, if a client could establish a connection and issued an HTTP GET, varnish returned nothing, no HTTP headers, nothing.
Running "ps -efL | grep varnish | wc -l" I revelaed that there were ~ 500 varnish threads. I am using the default configuration with regards to threads (max of 500). To me it seemed that when a client tried to connect to varnish there were no thread available to use so the client just hung there until either it or varnish timeout out and disconnected. Unfortunately I didn't have the good sense to capture a "varnishastat -l" after this happened. I was focused on getting the server back to a working state so I ended up restarting varnishd. Here is my varnishd command line followed by a current "varnishstat -l" (I have set the weight for this server to be lower than the other varnish instance so that the cache can "warm up". There is typically 4 x as much traffic): /usr/local/sbin/varnishd -s file,/tmp/varnish-cache,60G -T 127.0.0.1:2000 -a 0.0.0.0:80 -t 604800 -f /usr/local/etc/varnish/default.vcl -p http_headers 384 -p connect_timeout 4.0 client_conn 4985179 120.45 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 4907077 118.56 Client requests received cache_hit 3356368 81.09 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 1550606 37.46 Cache misses backend_conn 1530014 36.97 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 0 0.00 Backend conn. failures backend_reuse 20690 0.50 Backend conn. reuses backend_toolate 0 0.00 Backend conn. was closed backend_recycle 20691 0.50 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 1 0.00 Fetch head fetch_length 33270 0.80 Fetch with Length fetch_chunked 1517362 36.66 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 70 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed n_sess_mem 262 . N struct sess_mem n_sess 68 . N struct sess n_object 1550439 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 1550458 . N struct objectcore n_objecthead 1550412 . N struct objecthead n_smf 3100879 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 1 . N large free smf n_vbe_conn 1 . N struct vbe_conn n_wrk 29 . N worker threads n_wrk_create 870 0.02 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 3128 0.08 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 4696 0.11 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 2 . N backends n_expired 157 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 3077705 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 4817364 116.39 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 4985176 120.45 Total Sessions s_req 4907077 118.56 Total Requests s_pipe 0 0.00 Total pipe s_pass 102 0.00 Total pass s_fetch 1550703 37.47 Total fetch s_hdrbytes 1590643697 38431.56 Total header bytes s_bodybytes 17647134982 426372.59 Total body bytes sess_closed 4522198 109.26 Session Closed sess_pipeline 4 0.00 Session Pipeline sess_readahead 8 0.00 Session Read Ahead sess_linger 469810 11.35 Session Linger sess_herd 476189 11.51 Session herd shm_records 297887487 7197.26 SHM records shm_writes 23469767 567.05 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 51830 1.25 SHM MTX contention shm_cycles 137 0.00 SHM cycles through buffer sm_nreq 3101298 74.93 allocator requests sm_nobj 3100878 . outstanding allocations sm_balloc 13670006784 . bytes allocated sm_bfree 50754502656 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 5 0.00 SMS allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 5 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 2090 . SMS bytes allocated sms_bfree 2090 . SMS bytes freed backend_req 1550708 37.47 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 4906976 118.56 HCB Lookups without lock hcb_lock 1550518 37.46 HCB Lookups with lock hcb_insert 1550517 37.46 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 41389 1.00 Client uptime backend_retry 0 0.00 Backend conn. retry dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 0 0.00 Fetch no body (304) Could there be something wrong with my configuration that caused this problem? Thanks Matt Schurenko Systems Administrator airG(r) Share Your World Suite 710, 1133 Melville Street Vancouver, BC V6E 4E5 P: +1.604.408.2228 F: +1.866.874.8136 E: [email protected] W: www.airg.com<http://www.airg.com> airG is one of BC's Top 55 Employers and Canada's Top Employers for Young People P Please consider the environment before printing this e-mail. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material communicated under NDA. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
