Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-18 Thread Oleg Nesterov
On 07/15, Shayan Pooya wrote: > > >> --- x/kernel/sched/core.c > >> +++ x/kernel/sched/core.c > >> @@ -2793,8 +2793,11 @@ asmlinkage __visible void schedule_tail(struct > >> task_struct *prev) > >> balance_callback(rq); > >> preempt_enable(); > >> > >> - if (current->set_chil

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-15 Thread Shayan Pooya
>> I am just curious... can you reproduce the problem reliably? If yes, can you >> try >> the patch below ? Just in case, this is not the real fix in any case... > > Yes. It deterministically results in hung processes in vanilla kernel. > I'll try this patch. I'll have to correct this. I can repr

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-14 Thread Shayan Pooya
> Well, but we can't do this. And "as expected" is actually just wrong. I still > think that the whole FAULT_FLAG_USER logic is not right. This needs another > email. I meant as expected from the content of the patch :) I think Konstantin agrees that this patch cannot be merged upstream. > fork(

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-14 Thread Oleg Nesterov
On 07/12, Shayan Pooya wrote: > > > Yep. Bug still not fixed in upstream. In our kernel I've plugged it with > > this: > > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2808,8 +2808,9 @@ asmlinkage __visible void schedule_tail(struct > > task_struct *prev) > > balance_

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-13 Thread Michal Hocko
On Tue 12-07-16 08:35:06, Shayan Pooya wrote: > >> With strace, when running 500 concurrent mem-hog tasks on the same > >> kernel, 33 of them failed with: > >> > >> strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion > >> `THREAD_GETMEM (self, tid) != ppid' failed. > >> > >> Which is: https:

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Shayan Pooya
> Yep. Bug still not fixed in upstream. In our kernel I've plugged it with > this: > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2808,8 +2808,9 @@ asmlinkage __visible void schedule_tail(struct > task_struct *prev) > balance_callback(rq); > preempt_enable(); > > -

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Oleg Nesterov
On 07/12, Konstantin Khlebnikov wrote: > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2808,8 +2808,9 @@ asmlinkage __visible void schedule_tail(struct > task_struct *prev) > balance_callback(rq); > preempt_enable(); > > - if (current->set_child_tid) > -

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Konstantin Khlebnikov
On 12.07.2016 18:35, Shayan Pooya wrote: With strace, when running 500 concurrent mem-hog tasks on the same kernel, 33 of them failed with: strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion `THREAD_GETMEM (self, tid) != ppid' failed. Which is: https://sourceware.org/bugzilla/show_bug.c

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Shayan Pooya
>> With strace, when running 500 concurrent mem-hog tasks on the same >> kernel, 33 of them failed with: >> >> strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion >> `THREAD_GETMEM (self, tid) != ppid' failed. >> >> Which is: https://sourceware.org/bugzilla/show_bug.cgi?id=15392 >> And discu

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Michal Hocko
On Mon 11-07-16 11:33:19, Shayan Pooya wrote: > >> Could you post the stack trace of the hung oom victim? Also could you > >> post the full kernel log? > > With strace, when running 500 concurrent mem-hog tasks on the same > kernel, 33 of them failed with: > > strace: ../sysdeps/nptl/fork.c:136:

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-12 Thread Michal Hocko
On Mon 11-07-16 10:40:55, Shayan Pooya wrote: > > > > Could you post the stack trace of the hung oom victim? Also could you > > post the full kernel log? > > Here is the stack of the process that lives (it is *not* the > oom-victim) in a run with 100 processes and *without* strace: > > # cat /pro

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-11 Thread Shayan Pooya
>> Could you post the stack trace of the hung oom victim? Also could you >> post the full kernel log? With strace, when running 500 concurrent mem-hog tasks on the same kernel, 33 of them failed with: strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion `THREAD_GETMEM (self, tid) != ppid' f

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-11 Thread Shayan Pooya
> > Could you post the stack trace of the hung oom victim? Also could you > post the full kernel log? Here is the stack of the process that lives (it is *not* the oom-victim) in a run with 100 processes and *without* strace: # cat /proc/7688/stack [] futex_wait_queue_me+0xc2/0x120 [] futex_wait+0

Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-10 Thread Michal Hocko
On Sat 09-07-16 16:49:32, Shayan Pooya wrote: > I came across the following issue in kernel 3.16 (Ubuntu 14.04) which > was then reproduced in kernels 4.4 LTS: > After a couple of of memcg oom-kills in a cgroup, a syscall in > *another* process in the same cgroup hangs indefinitely. > > Reproducin

bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-09 Thread Shayan Pooya
I came across the following issue in kernel 3.16 (Ubuntu 14.04) which was then reproduced in kernels 4.4 LTS: After a couple of of memcg oom-kills in a cgroup, a syscall in *another* process in the same cgroup hangs indefinitely. Reproducing: # mkdir -p strace_run # mkdir /sys/fs/cgroup/memory/1