Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-29 Thread Dmitry Vyukov
On Wed, Nov 29, 2017 at 5:54 AM, Zhouyi Zhou wrote: > Hi, > There is new discoveries! > > When I find qlist_move_cache reappear in my environment, > I use kgdb to break into function qlist_move_cache. I found > this function is called because of cgroup release. > > I also find libvirt allocate a

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Hi, There is new discoveries! When I find qlist_move_cache reappear in my environment, I use kgdb to break into function qlist_move_cache. I found this function is called because of cgroup release. I also find libvirt allocate a memory croup for each qemu it started, in my system, it looks like

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Hi, I will try to reestablish the environment, and design proof of concept of experiment. Cheers On Wed, Nov 29, 2017 at 1:57 AM, Dmitry Vyukov wrote: > On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov wrote: >> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou wrote: >>> Hi, >>>By using perf

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov wrote: > On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou wrote: >> Hi, >>By using perf top, qlist_move_cache occupies 100% cpu did really >> happen in my environment yesterday, or I >> won't notice the kasan code. >>Currently I have difficulty

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou wrote: > Hi, >By using perf top, qlist_move_cache occupies 100% cpu did really > happen in my environment yesterday, or I > won't notice the kasan code. >Currently I have difficulty to let it reappear because the frontend > guy modified some us

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Hi, By using perf top, qlist_move_cache occupies 100% cpu did really happen in my environment yesterday, or I won't notice the kasan code. Currently I have difficulty to let it reappear because the frontend guy modified some user mode code. I can repeat again and again now is kgdb_breakpoi

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 10:17 AM, Zhouyi Zhou wrote: > Hi, > Imagine all of the QUARANTINE_BATCHES elements of > global_quarantine array is of size 4MB + 1MB, now a new call > to quarantine_put is invoked, one of the element will be of size 4MB + > 1MB + 1MB, so on and on. I see what you mea

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Hi, Imagine all of the QUARANTINE_BATCHES elements of global_quarantine array is of size 4MB + 1MB, now a new call to quarantine_put is invoked, one of the element will be of size 4MB + 1MB + 1MB, so on and on. On Tue, Nov 28, 2017 at 4:58 PM, Dmitry Vyukov wrote: > On Tue, Nov 28, 2017 at 9:

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou wrote: > Hi, >Please take a look at function quarantine_put, I don't think following > code will limit the batch size below quarantine_batch_size. It only advance > quarantine_tail after qlist_move_all. > > qlist_move_all(q, &temp);

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Hi, Please take a look at function quarantine_put, I don't think following code will limit the batch size below quarantine_batch_size. It only advance quarantine_tail after qlist_move_all. qlist_move_all(q, &temp); spin_lock(&quarantine_lock); WR

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou wrote: > Thanks for reviewing >My machine has 128G of RAM, and runs many KVM virtual machines. > libvirtd always > report "internal error: received hangup / error event on socket" under > heavy memory load. >Then I use perf top -g, qlist_move_ca

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-28 Thread Zhouyi Zhou
Thanks for reviewing My machine has 128G of RAM, and runs many KVM virtual machines. libvirtd always report "internal error: received hangup / error event on socket" under heavy memory load. Then I use perf top -g, qlist_move_cache consumes 100% cpu for several minutes. On Tue, Nov 28, 2017

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-27 Thread Dmitry Vyukov
On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou wrote: > When there are huge amount of quarantined cache allocates in system, > number of entries in global_quarantine[i] will be great. Meanwhile, > there is no relax in while loop in function qlist_move_cache which > hold quarantine_lock. As a result,

Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

2017-11-27 Thread Zhouyi Zhou
When there are huge amount of quarantined cache allocates in system, number of entries in global_quarantine[i] will be great. Meanwhile, there is no relax in while loop in function qlist_move_cache which hold quarantine_lock. As a result, some userspace programs for example libvirt will complain.