Re: How to fix slow responses which happens periodically?

Bryan Call Wed, 24 Aug 2016 08:55:44 -0700

What is your average object size in cache?  I wouldn’t think over allocating 
directory entries would have any noticeable effect, besides using more memory.  
Did you clear your cache?  Maybe it is a side effect from that or something 
else changed.


You don’t have setup hosting.config if you are going to have all of the hosts 
use all of the volumes, that is done by default.  You would only use it if you 
only want specific domains using a certain volume.

-Bryan




> On Aug 24, 2016, at 3:32 AM, Hiroaki Nakamura <[email protected]> wrote:
> 
> Bryan,
> 
> Thanks for your advise.
> 
> I changed the value of proxy.config.cache.min_average_object_size to 4
> times of the original value,
> and there is no slow log now. And period between slow responses become 1/4 
> time.
> So this period seems proportional to number of directory entries.
> I'm wondering what kind of work are running in this slow response period.
> Could you explain it?
> 
> Doing numastat -v had shown memory allocated almost evenly across the
> NUMA nodes.
> And I didn't use large RAM caches.
> 
> I'll try configuring volumes to 4. With your default volume.config, is
> the following hosting.config valid?
> 
> hostname=* volume=1,2,3,4
> 
> With these configs, will ATS use volumes evenly for requests?
> How will ATS choose volume?
> 
> Thanks
> 
> 
> 2016-08-23 3:21 GMT+09:00 Bryan Call <[email protected]>:
>> Are you seeing the CPU spike during this time too?  What does dstat show you
>> during this time?  It might be helpful to do a sudo perf record when you are
>> seeing latency issues to see what function calls are using the CPU during
>> this time.
>> 
>> Do you have a large RAM cache and multiple NUMA nodes?  Doing a numastat -v
>> will show you if you have memory allocated unevenly across the NUMA nodes.
>> Also, setting transparent_hugepage=never on boot has help people with large
>> RAM caches.  This is a problem that Steve might be having.
>> 
>> What is your request rate during these spikes?  It is possible you are
>> running into lock contention in the cache if the request rate is high.  If
>> you have only one storage devices and no volumes, then you will only have 1
>> lock for the cache.  If you configure volumes to 4 this will reduce the
>> amount of lock contention.  This is our default volume.config:
>> 
>> volume=1 scheme=http size=25%
>> volume=2 scheme=http size=25%
>> volume=3 scheme=http size=25%
>> volume=4 scheme=http size=25%
>> 
>> Hopefully, one of these pointers will fix the issue you are having.
>> 
>> -Bryan
>> 
>> 
>> 
>> 
>> On Aug 21, 2016, at 9:00 PM, Hiroaki Nakamura <[email protected]> wrote:
>> 
>> I turned on debug log and got slow log and debug log.
>> 
>> slow log
>> [Aug 22 11:32:48.103] Server {0x2af64b6c1700} ERROR: <HttpSM.cc:6820
>> (update_stats)> [6072850] Slow Request: client_ip: 133.167.4.17:48228
>> url: **masked**/7270.png status: 200 unique id:  redirection_tries: 0
>> bytes: 43 fd: 0 client state: 0 server state: 9 ua_begin: 0.000
>> ua_first_read: 0.000 ua_read_header_done: 0.000 cache_open_read_begin:
>> 0.003 cache_open_read_end: 2.913 dns_lookup_begin: 2.913
>> dns_lookup_end: 2.913 server_connect: 2.925 server_first_read: 2.950
>> server_read_header_done: 2.950 server_close: 2.953 ua_close: 2.953
>> sm_finish: 2.954 plugin_active: 0.000 plugin_total: 0.000
>> 
>> 
>> debug log
>> 
>> [Aug 22 11:32:45.151] Server {0x2af64b6c1700} DEBUG: <HttpSM.cc:4403
>> (do_cache_lookup_and_read)> (http_seq)
>> [HttpSM::do_cache_lookup_and_read] [6072850] Issuing cache lookup for
>> URL  **masked**/7270.png
>> ...
>> [Aug 22 11:32:47.942] Server {0x2af64bcc7700} DEBUG:
>> <HttpTransactCache.cc:183 (SelectFromAlternates)> (http_match)
>> [SelectFromAlternates] # alternates = 1
>> ...
>> [Aug 22 11:32:48.061] Server {0x2af64b6c1700} DEBUG:
>> <HttpCacheSM.cc:102 (state_cache_open_read)> (http_cache) [6072850]
>> [&HttpCacheSM::state_cache_open_read, CACHE_EVENT_OPEN_READ]
>> 
>> I noticed the gap which is roughly equal to (cache_open_read_end -
>> cache_open_read_begin) above.
>> 
>> 2016-08-22 9:25 GMT+09:00 Hiroaki Nakamura <[email protected]>:
>> 
>> Hi Bryan,
>> Thanks for your reply!
>> 
>> I checked my server and I found it is already a small value.
>> 
>> $ sudo sysctl -a | grep vm.dirty_writeback_centisecs
>> vm.dirty_writeback_centisecs = 500
>> 
>> Is there another config to check?
>> Thanks
>> 
>> 2016-08-20 1:36 GMT+09:00 Bryan Call <[email protected]>:
>> 
>> I have seen issues with the write buffers filling and being flushed while it
>> stalls the process.  This happened to me when doing large file transfers on
>> Linux.
>> 
>> I ended up changing the value of vm.dirty_writeback_centisecs to 1500.
>> 
>> $ sudo sysctl -a | grep writeback
>> vm.dirty_writeback_centisecs = 1500
>> 
>> Here is an article on the issue:
>> https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
>> 
>> -Bryan
>> 
>> 
>> 
>> 
>> On Aug 18, 2016, at 10:56 PM, Hiroaki Nakamura <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I forgot to write traffic server version.
>> I'm using ATS 6.1.1 and using the raw partition for the cache storage.
>> 
>> I also noticed vmstat disk/writ values remain 0 for one minute,
>> and then non-zero values follow and slow responses happen.
>> 
>> 
>> 2016-08-19 14:27 GMT+09:00 Hiroaki Nakamura <[email protected]>:
>> 
>> Hi,
>> 
>> I'm having a trouble that sometimes a request took 3 to 5 seconds on
>> approximately
>> every 39 minutes.
>> 
>> I turned on the slow log and found out that cache_open_read_end is the most
>> time consuming.
>> 
>> status: 200 unique id:  redirection_tries: 0 bytes: 43 fd: 0 client
>> state: 0 server state: 9 ua_begin: 0.000 ua_first_read: 0.000
>> ua_read_header_done: 0.000 cache_open_read_begin: 0.000
>> cache_open_read_end: 2.536 dns_lookup_begin: 2.536 dns_lookup_end:
>> 2.536 server_connect: 2.537 server_first_read: 2.589
>> server_read_header_done: 2.589 server_close: 2.589 ua_close: 2.589
>> sm_finish: 2.589 plugin_active: 0.000 plugin_total: 0.000
>> 
>> Could you give me advice to know the reason why cache_open_read_end took so
>> long
>> and fix the slow responses?
>> 
>> Thanks,
>> Hiroaki
>> 
>> 
>>

Re: How to fix slow responses which happens periodically?

Reply via email to