Bryan, Thanks for your advise.
I changed the value of proxy.config.cache.min_average_object_size to 4 times of the original value, and there is no slow log now. And period between slow responses become 1/4 time. So this period seems proportional to number of directory entries. I'm wondering what kind of work are running in this slow response period. Could you explain it? Doing numastat -v had shown memory allocated almost evenly across the NUMA nodes. And I didn't use large RAM caches. I'll try configuring volumes to 4. With your default volume.config, is the following hosting.config valid? hostname=* volume=1,2,3,4 With these configs, will ATS use volumes evenly for requests? How will ATS choose volume? Thanks 2016-08-23 3:21 GMT+09:00 Bryan Call <[email protected]>: > Are you seeing the CPU spike during this time too? What does dstat show you > during this time? It might be helpful to do a sudo perf record when you are > seeing latency issues to see what function calls are using the CPU during > this time. > > Do you have a large RAM cache and multiple NUMA nodes? Doing a numastat -v > will show you if you have memory allocated unevenly across the NUMA nodes. > Also, setting transparent_hugepage=never on boot has help people with large > RAM caches. This is a problem that Steve might be having. > > What is your request rate during these spikes? It is possible you are > running into lock contention in the cache if the request rate is high. If > you have only one storage devices and no volumes, then you will only have 1 > lock for the cache. If you configure volumes to 4 this will reduce the > amount of lock contention. This is our default volume.config: > > volume=1 scheme=http size=25% > volume=2 scheme=http size=25% > volume=3 scheme=http size=25% > volume=4 scheme=http size=25% > > Hopefully, one of these pointers will fix the issue you are having. > > -Bryan > > > > > On Aug 21, 2016, at 9:00 PM, Hiroaki Nakamura <[email protected]> wrote: > > I turned on debug log and got slow log and debug log. > > slow log > [Aug 22 11:32:48.103] Server {0x2af64b6c1700} ERROR: <HttpSM.cc:6820 > (update_stats)> [6072850] Slow Request: client_ip: 133.167.4.17:48228 > url: **masked**/7270.png status: 200 unique id: redirection_tries: 0 > bytes: 43 fd: 0 client state: 0 server state: 9 ua_begin: 0.000 > ua_first_read: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: > 0.003 cache_open_read_end: 2.913 dns_lookup_begin: 2.913 > dns_lookup_end: 2.913 server_connect: 2.925 server_first_read: 2.950 > server_read_header_done: 2.950 server_close: 2.953 ua_close: 2.953 > sm_finish: 2.954 plugin_active: 0.000 plugin_total: 0.000 > > > debug log > > [Aug 22 11:32:45.151] Server {0x2af64b6c1700} DEBUG: <HttpSM.cc:4403 > (do_cache_lookup_and_read)> (http_seq) > [HttpSM::do_cache_lookup_and_read] [6072850] Issuing cache lookup for > URL **masked**/7270.png > ... > [Aug 22 11:32:47.942] Server {0x2af64bcc7700} DEBUG: > <HttpTransactCache.cc:183 (SelectFromAlternates)> (http_match) > [SelectFromAlternates] # alternates = 1 > ... > [Aug 22 11:32:48.061] Server {0x2af64b6c1700} DEBUG: > <HttpCacheSM.cc:102 (state_cache_open_read)> (http_cache) [6072850] > [&HttpCacheSM::state_cache_open_read, CACHE_EVENT_OPEN_READ] > > I noticed the gap which is roughly equal to (cache_open_read_end - > cache_open_read_begin) above. > > 2016-08-22 9:25 GMT+09:00 Hiroaki Nakamura <[email protected]>: > > Hi Bryan, > Thanks for your reply! > > I checked my server and I found it is already a small value. > > $ sudo sysctl -a | grep vm.dirty_writeback_centisecs > vm.dirty_writeback_centisecs = 500 > > Is there another config to check? > Thanks > > 2016-08-20 1:36 GMT+09:00 Bryan Call <[email protected]>: > > I have seen issues with the write buffers filling and being flushed while it > stalls the process. This happened to me when doing large file transfers on > Linux. > > I ended up changing the value of vm.dirty_writeback_centisecs to 1500. > > $ sudo sysctl -a | grep writeback > vm.dirty_writeback_centisecs = 1500 > > Here is an article on the issue: > https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ > > -Bryan > > > > > On Aug 18, 2016, at 10:56 PM, Hiroaki Nakamura <[email protected]> wrote: > > Hi, > > I forgot to write traffic server version. > I'm using ATS 6.1.1 and using the raw partition for the cache storage. > > I also noticed vmstat disk/writ values remain 0 for one minute, > and then non-zero values follow and slow responses happen. > > > 2016-08-19 14:27 GMT+09:00 Hiroaki Nakamura <[email protected]>: > > Hi, > > I'm having a trouble that sometimes a request took 3 to 5 seconds on > approximately > every 39 minutes. > > I turned on the slow log and found out that cache_open_read_end is the most > time consuming. > > status: 200 unique id: redirection_tries: 0 bytes: 43 fd: 0 client > state: 0 server state: 9 ua_begin: 0.000 ua_first_read: 0.000 > ua_read_header_done: 0.000 cache_open_read_begin: 0.000 > cache_open_read_end: 2.536 dns_lookup_begin: 2.536 dns_lookup_end: > 2.536 server_connect: 2.537 server_first_read: 2.589 > server_read_header_done: 2.589 server_close: 2.589 ua_close: 2.589 > sm_finish: 2.589 plugin_active: 0.000 plugin_total: 0.000 > > Could you give me advice to know the reason why cache_open_read_end took so > long > and fix the slow responses? > > Thanks, > Hiroaki > > >
