Re: How to fix slow responses which happens periodically?

Hiroaki Nakamura Wed, 24 Aug 2016 03:33:07 -0700

Bryan,

Thanks for your advise.


I changed the value of proxy.config.cache.min_average_object_size to 4
times of the original value,
and there is no slow log now. And period between slow responses become 1/4 time.
So this period seems proportional to number of directory entries.
I'm wondering what kind of work are running in this slow response period.
Could you explain it?

Doing numastat -v had shown memory allocated almost evenly across the
NUMA nodes.
And I didn't use large RAM caches.

I'll try configuring volumes to 4. With your default volume.config, is
the following hosting.config valid?

hostname=* volume=1,2,3,4

With these configs, will ATS use volumes evenly for requests?
How will ATS choose volume?

Thanks


2016-08-23 3:21 GMT+09:00 Bryan Call <[email protected]>:
> Are you seeing the CPU spike during this time too?  What does dstat show you
> during this time?  It might be helpful to do a sudo perf record when you are
> seeing latency issues to see what function calls are using the CPU during
> this time.
>
> Do you have a large RAM cache and multiple NUMA nodes?  Doing a numastat -v
> will show you if you have memory allocated unevenly across the NUMA nodes.
> Also, setting transparent_hugepage=never on boot has help people with large
> RAM caches.  This is a problem that Steve might be having.
>
> What is your request rate during these spikes?  It is possible you are
> running into lock contention in the cache if the request rate is high.  If
> you have only one storage devices and no volumes, then you will only have 1
> lock for the cache.  If you configure volumes to 4 this will reduce the
> amount of lock contention.  This is our default volume.config:
>
> volume=1 scheme=http size=25%
> volume=2 scheme=http size=25%
> volume=3 scheme=http size=25%
> volume=4 scheme=http size=25%
>
> Hopefully, one of these pointers will fix the issue you are having.
>
> -Bryan
>
>
>
>
> On Aug 21, 2016, at 9:00 PM, Hiroaki Nakamura <[email protected]> wrote:
>
> I turned on debug log and got slow log and debug log.
>
> slow log
> [Aug 22 11:32:48.103] Server {0x2af64b6c1700} ERROR: <HttpSM.cc:6820
> (update_stats)> [6072850] Slow Request: client_ip: 133.167.4.17:48228
> url: **masked**/7270.png status: 200 unique id:  redirection_tries: 0
> bytes: 43 fd: 0 client state: 0 server state: 9 ua_begin: 0.000
> ua_first_read: 0.000 ua_read_header_done: 0.000 cache_open_read_begin:
> 0.003 cache_open_read_end: 2.913 dns_lookup_begin: 2.913
> dns_lookup_end: 2.913 server_connect: 2.925 server_first_read: 2.950
> server_read_header_done: 2.950 server_close: 2.953 ua_close: 2.953
> sm_finish: 2.954 plugin_active: 0.000 plugin_total: 0.000
>
>
> debug log
>
> [Aug 22 11:32:45.151] Server {0x2af64b6c1700} DEBUG: <HttpSM.cc:4403
> (do_cache_lookup_and_read)> (http_seq)
> [HttpSM::do_cache_lookup_and_read] [6072850] Issuing cache lookup for
> URL  **masked**/7270.png
> ...
> [Aug 22 11:32:47.942] Server {0x2af64bcc7700} DEBUG:
> <HttpTransactCache.cc:183 (SelectFromAlternates)> (http_match)
> [SelectFromAlternates] # alternates = 1
> ...
> [Aug 22 11:32:48.061] Server {0x2af64b6c1700} DEBUG:
> <HttpCacheSM.cc:102 (state_cache_open_read)> (http_cache) [6072850]
> [&HttpCacheSM::state_cache_open_read, CACHE_EVENT_OPEN_READ]
>
> I noticed the gap which is roughly equal to (cache_open_read_end -
> cache_open_read_begin) above.
>
> 2016-08-22 9:25 GMT+09:00 Hiroaki Nakamura <[email protected]>:
>
> Hi Bryan,
> Thanks for your reply!
>
> I checked my server and I found it is already a small value.
>
> $ sudo sysctl -a | grep vm.dirty_writeback_centisecs
> vm.dirty_writeback_centisecs = 500
>
> Is there another config to check?
> Thanks
>
> 2016-08-20 1:36 GMT+09:00 Bryan Call <[email protected]>:
>
> I have seen issues with the write buffers filling and being flushed while it
> stalls the process.  This happened to me when doing large file transfers on
> Linux.
>
> I ended up changing the value of vm.dirty_writeback_centisecs to 1500.
>
> $ sudo sysctl -a | grep writeback
> vm.dirty_writeback_centisecs = 1500
>
> Here is an article on the issue:
> https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
>
> -Bryan
>
>
>
>
> On Aug 18, 2016, at 10:56 PM, Hiroaki Nakamura <[email protected]> wrote:
>
> Hi,
>
> I forgot to write traffic server version.
> I'm using ATS 6.1.1 and using the raw partition for the cache storage.
>
> I also noticed vmstat disk/writ values remain 0 for one minute,
> and then non-zero values follow and slow responses happen.
>
>
> 2016-08-19 14:27 GMT+09:00 Hiroaki Nakamura <[email protected]>:
>
> Hi,
>
> I'm having a trouble that sometimes a request took 3 to 5 seconds on
> approximately
> every 39 minutes.
>
> I turned on the slow log and found out that cache_open_read_end is the most
> time consuming.
>
> status: 200 unique id:  redirection_tries: 0 bytes: 43 fd: 0 client
> state: 0 server state: 9 ua_begin: 0.000 ua_first_read: 0.000
> ua_read_header_done: 0.000 cache_open_read_begin: 0.000
> cache_open_read_end: 2.536 dns_lookup_begin: 2.536 dns_lookup_end:
> 2.536 server_connect: 2.537 server_first_read: 2.589
> server_read_header_done: 2.589 server_close: 2.589 ua_close: 2.589
> sm_finish: 2.589 plugin_active: 0.000 plugin_total: 0.000
>
> Could you give me advice to know the reason why cache_open_read_end took so
> long
> and fix the slow responses?
>
> Thanks,
> Hiroaki
>
>
>

Re: How to fix slow responses which happens periodically?

Reply via email to