Hi gentlemen,
I'm trying to test the performance of ATS v.4.0.2.
Server under test has quad-core CPU with HT disabled. During test (1k
user-agents, 1k origin servers, up to 6k requests per second with
average size of 8kb) at mark of 2-2.5k requests per second I see the
signs of overloading (growing delay time, missed responses). The problem
is that according to top output, CPU cycles are not under heavy load
(which is strange for overloaded system). All the other parameters (ram,
I/O, network) are far from saturation too. Top shows load at about
50-60% of one core for [ET_NET 0] process. traffic_server instances seem
to be spreaded between all the cores, even if I'm trying to bind them
mandatory to one or the two of the corec using taskset.
My alterations to default ats configuration (mostly following this
guide:http://www.ogre.com/node/392):
Cache is fully disabled:
CONFIG proxy.config.http.cache.http INT 0
Threads:
CONFIG proxy.config.exec_thread.autoconfig INT 0
CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1
CONFIG proxy.config.exec_thread.limit INT 4
CONFIG proxy.config.accept_threads INT 2
CONFIG proxy.config.cache.threads_per_disk INT 1
CONFIG proxy.config.task_threads INT 4
So my questions are the next:
1) Is there any known strategy to distribute ATS processes/threads by
CPU cores? E.g. All the traffic_server threads bind to cpu0 and cpu1,
all traffic_manager threads to cpu2 and networking interrupts to cpu3?
2) If so, how can this be done? I see some threads ignore 'taskset -a -p
1,2 <traffic_server pid>' and are being executed on any CPU core. May be
configuration directives?
3) What is the better strategy for core configuration? Should sum of
task, accept and network threads be equal to CPU cores number + 1? Or
anything else? May be it's better to use 40 threads in sum for quad-core
device?
4) Does *thread* config options are taking in account if
proxy.config.http.cache.http is set to '1'?
5) What other options should have influence on system performance in
case of cache-off test?
TIA,
Pavel