Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Willy Tarreau
Hi Lukas, On Mon, Jan 15, 2018 at 11:30:49PM +0100, Lukas Tribus wrote: > Also consider this report from discourse: > https://discourse.haproxy.org/t/haproxy1-8-3-dynamic-dns-resolvers-problem/1997 > > Haproxy 1.8.3 with nbthread > 1 and while reloading, DNS resolutions fails. > Its possibile

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Lukas Tribus
Hey guys, On 15 January 2018 at 20:49, Willy Tarreau wrote: > Samuel, > > While running a few tests with Christopher's patch in order to integrate > it, I managed to find a case where I'm still seeing quite a number of > calls to epoll_wait(0)=0. Studying the patch, I found that

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Willy Tarreau
Samuel, While running a few tests with Christopher's patch in order to integrate it, I managed to find a case where I'm still seeing quite a number of calls to epoll_wait(0)=0. Studying the patch, I found that there's a corner case which it doesn't address, which is where an fd is processed and

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Willy Tarreau
On Mon, Jan 15, 2018 at 09:39:25AM -0600, Samuel Reed wrote: > Reload is done via the sysvinit wrapper, which executes: > > $HAPROXY -f "$CONFIG" -p $PIDFILE -sf $(cat $PIDFILE) -D $EXTRAOPTS > > No $EXTRAOPTS are specified. We don't use unix sockets and we don't use > master-worker. I'll send

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Samuel Reed
Reload is done via the sysvinit wrapper, which executes: $HAPROXY -f "$CONFIG" -p $PIDFILE -sf $(cat $PIDFILE) -D $EXTRAOPTS No $EXTRAOPTS are specified. We don't use unix sockets and we don't use master-worker. I'll send an anonymized haproxy config directly to you two along with the results of

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Willy Tarreau
On Mon, Jan 15, 2018 at 03:28:12PM +0100, Christopher Faulet wrote: > Could you provide the output of "show > sess all" command on the CLI (and maybe "show fd" too) ? By the way, be careful, "show sess all" will be huge and will disclose some potentially sensitive information (source/destination

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Christopher Faulet
Le 15/01/2018 à 15:14, Samuel Reed a écrit : Thank you for the patch and your quick attention to this issue. Results after a few reloads, 8 threads on 16 core machine, both draining and new process have patches. New process: % time seconds  usecs/call calls    errors syscall --

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Willy Tarreau
On Mon, Jan 15, 2018 at 08:14:40AM -0600, Samuel Reed wrote: > Thank you for the patch and your quick attention to this issue. Results > after a few reloads, 8 threads on 16 core machine, both draining and new > process have patches. > > New process: > > % time seconds  usecs/call calls  

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Samuel Reed
Thank you for the patch and your quick attention to this issue. Results after a few reloads, 8 threads on 16 core machine, both draining and new process have patches. New process: % time seconds  usecs/call calls    errors syscall -- --- --- - -

Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Christopher Faulet
Le 12/01/2018 à 18:51, Willy Tarreau a écrit : On Fri, Jan 12, 2018 at 11:06:32AM -0600, Samuel Reed wrote: On 1.8-git, similar results on the new process: % time seconds  usecs/call calls    errors syscall -- --- --- - -  93.75   

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Willy Tarreau
On Fri, Jan 12, 2018 at 11:06:32AM -0600, Samuel Reed wrote: > On 1.8-git, similar results on the new process: > > % time seconds  usecs/call calls    errors syscall > -- --- --- - - >  93.75    0.265450  15 17805  

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
On 1.8-git, similar results on the new process: % time seconds  usecs/call calls    errors syscall -- --- --- - -  93.75    0.265450  15 17805   epoll_wait   4.85    0.013730  49   283   write  

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Willy Tarreau
On Fri, Jan 12, 2018 at 10:13:55AM -0600, Samuel Reed wrote: > Excellent! Please let me know if there's any other output you'd like > from this machine. > > Strace on that new process shows thousands of these types of syscalls, > which vary slightly, > > epoll_wait(3, {{EPOLLIN, {u32=206,

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
Excellent! Please let me know if there's any other output you'd like from this machine. Strace on that new process shows thousands of these types of syscalls, which vary slightly, epoll_wait(3, {{EPOLLIN, {u32=206, u64=206}}}, 200, 239) = 1 and these: epoll_wait(3, {}, 200, 0)   =

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Willy Tarreau
On Fri, Jan 12, 2018 at 09:50:58AM -0600, Samuel Reed wrote: > To accelerate the process, I've increased the number of threads from 4 > to 8 on a 16-core machine. Ran strace for about 5s on each. > > Single process (8 threads): > > $ strace -cp 16807 > % time seconds  usecs/call calls   

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
To accelerate the process, I've increased the number of threads from 4 to 8 on a 16-core machine. Ran strace for about 5s on each. Single process (8 threads): $ strace -cp 16807 % time seconds  usecs/call calls    errors syscall -- --- --- - -

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Willy Tarreau
On Fri, Jan 12, 2018 at 09:28:54AM -0600, Samuel Reed wrote: > Thanks for your quick answer, Willy. > > That's a shame to hear but makes sense. We'll try out some ideas for > reducing contention. We don't use cpu-map with nbthread; I considered it > best to let the kernel take care of this,

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
Thanks for your quick answer, Willy. That's a shame to hear but makes sense. We'll try out some ideas for reducing contention. We don't use cpu-map with nbthread; I considered it best to let the kernel take care of this, especially since there are some other processes on that box. I don't really

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Willy Tarreau
Hi Samuel, On Thu, Jan 11, 2018 at 08:29:15PM -0600, Samuel Reed wrote: > Is there a regression in the 1.8 series with SO_REUSEPORT and nbthread > (we didn't see this before with nbproc) or somewhere we should start > looking? In fact no, nbthread is simply new so it's not a regression but we're

High load average under 1.8 with multiple draining processes

2018-01-11 Thread Samuel Reed
We've recently upgraded to HAProxy 1.8.3, which we run with `nbthread 4` (we used to run nbproc 4 with older releases). This has generally been good, especially for stick tables & stats. We terminate SSL and proxy a large number of long-running TCP connections (websockets). When configuration