On Thu, 9 Feb 2017, Michal Hocko wrote:
> Christoph, you are completely ignoring the reality and the code. There
> is no need for stop_machine nor it is helping anything. As the matter
> of fact there is a synchronization with the cpu hotplug needed if you
> want to make a per-cpu specific operati
On Thu 09-02-17 11:22:49, Cristopher Lameter wrote:
> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > You are just not getting it, really.
> >
> > The problem is that this for_each_online_cpu() is racy against a concurrent
> > hot unplug and therefor can queue stuff for a not longer online cpu. T
On Thu, 9 Feb 2017, Christoph Lameter wrote:
> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > You are just not getting it, really.
> >
> > The problem is that this for_each_online_cpu() is racy against a concurrent
> > hot unplug and therefor can queue stuff for a not longer online cpu. That's
>
On Thu, 9 Feb 2017, Thomas Gleixner wrote:
> You are just not getting it, really.
>
> The problem is that this for_each_online_cpu() is racy against a concurrent
> hot unplug and therefor can queue stuff for a not longer online cpu. That's
> what the mm folks tried to avoid by preventing a CPU hot
On Thu, 9 Feb 2017, Christoph Lameter wrote:
> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > > The stop_machine would need to ensure that all cpus cease processing
> > > before proceeding.
> >
> > Ok. I try again:
> >
> > CPU 0 CPU 1
> > for_each_online_cpu(cpu)
> >
On Thu, 9 Feb 2017, Thomas Gleixner wrote:
> > The stop_machine would need to ensure that all cpus cease processing
> > before proceeding.
>
> Ok. I try again:
>
> CPU 0 CPU 1
> for_each_online_cpu(cpu)
> ==> cpu = 1
> stop_machine()
>
> Stops proces
On Thu, 9 Feb 2017, Christoph Lameter wrote:
> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > And how does that solve the problem at hand? Not at all:
> >
> > CPU 0 CPU 1
> >
> > for_each_online_cpu(cpu)
> > ==> cpu = 1
> > stop_machine()
> >
On Thu, 9 Feb 2017, Thomas Gleixner wrote:
> And how does that solve the problem at hand? Not at all:
>
> CPU 0 CPU 1
>
> for_each_online_cpu(cpu)
> ==> cpu = 1
> stop_machine()
> set_cpu_online(1, false)
> queue_work(cpu1)
On Wed, 8 Feb 2017, Christoph Lameter wrote:
> On Wed, 8 Feb 2017, Thomas Gleixner wrote:
>
> > There is a world outside yours. Hotplug is actually used frequently for
> > power purposes in some scenarios.
>
> The usual case does not inolve hotplug.
We do not care about your definition of "usual
On Wed, 8 Feb 2017, Thomas Gleixner wrote:
> There is a world outside yours. Hotplug is actually used frequently for
> power purposes in some scenarios.
The usual case does not inolve hotplug.
> It will improve nothing. The stop machine context is extremly limited and
> you cannot do complex thi
On Wed, 8 Feb 2017, Christoph Lameter wrote:
> On Wed, 8 Feb 2017, Michal Hocko wrote:
>
> > I have no idea what you are trying to say and how this is related to the
> > deadlock we are discussing here. We certainly do not need to add
> > stop_machine the problem. And yeah, dropping get_online_cpu
On Wed 08-02-17 09:11:06, Cristopher Lameter wrote:
> On Wed, 8 Feb 2017, Michal Hocko wrote:
>
> > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there
> > > must be simpler solutions than that.
> >
> > Absolutely agreed. We are in the page allocator path so using the
> > st
On Wed, Feb 08, 2017 at 02:03:32PM +, Mel Gorman wrote:
> > Yeah, we'll sort that out once it hits Linus tree and we move RT forward.
> > Though I have once complaint right away:
> >
> > + preempt_enable_no_resched();
> >
> > This is a nono, even in mainline. You effectively disable a preem
On Wed, 8 Feb 2017, Michal Hocko wrote:
> I have no idea what you are trying to say and how this is related to the
> deadlock we are discussing here. We certainly do not need to add
> stop_machine the problem. And yeah, dropping get_online_cpus was
> possible after considering all fallouts.
This
On Wed, 8 Feb 2017, Michal Hocko wrote:
> > Huch? stop_machine() is horrible and heavy weight. Don't go there, there
> > must be simpler solutions than that.
>
> Absolutely agreed. We are in the page allocator path so using the
> stop_machine* is just ridiculous. And, in fact, there is a much simp
On Tue, 7 Feb 2017, Thomas Gleixner wrote:
> > Yep. Hotplug events are pretty significant. Using stop_machine_() etc
> > would be advisable and that would avoid the taking of locks and get rid of
> > all the
> > ocmplexity, reduce the code size and make the overall system much more
> > reliab
On Wed, 8 Feb 2017, Mel Gorman wrote:
> It may be worth noting that patches in Andrew's tree no longer disable
> interrupts in the per-cpu allocator and now per-cpu draining will
> be from workqueue context. The reasoning was due to the overhead of
> the page allocator with figures included. Interr
On Wed, Feb 08, 2017 at 02:23:19PM +0100, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Mel Gorman wrote:
> > It may be worth noting that patches in Andrew's tree no longer disable
> > interrupts in the per-cpu allocator and now per-cpu draining will
> > be from workqueue context. The reasoning was
On Wed, Feb 08, 2017 at 01:02:07PM +0100, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Michal Hocko wrote:
> > On Tue 07-02-17 23:25:17, Thomas Gleixner wrote:
> > > On Tue, 7 Feb 2017, Christoph Lameter wrote:
> > > > On Tue, 7 Feb 2017, Michal Hocko wrote:
> > > >
> > > > > I am always nervous w
On Wed 08-02-17 13:02:07, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Michal Hocko wrote:
[...]
> > [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org
>
> Well, yes. It's simple, but from an RT point of view I really don't like
> it as we have to fix it up again.
I thought tha
On Wed, 8 Feb 2017, Michal Hocko wrote:
> On Tue 07-02-17 23:25:17, Thomas Gleixner wrote:
> > On Tue, 7 Feb 2017, Christoph Lameter wrote:
> > > On Tue, 7 Feb 2017, Michal Hocko wrote:
> > >
> > > > I am always nervous when seeing hotplug locks being used in low level
> > > > code. It has bitten
On Tue 07-02-17 23:25:17, Thomas Gleixner wrote:
> On Tue, 7 Feb 2017, Christoph Lameter wrote:
> > On Tue, 7 Feb 2017, Michal Hocko wrote:
> >
> > > I am always nervous when seeing hotplug locks being used in low level
> > > code. It has bitten us several times already and those deadlocks are
> >
On Mon, 6 Feb 2017, Dmitry Vyukov wrote:
> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote:
> Unfortunately it does not seem to help.
> Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of
> mmotm/auto-latest
> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git):
>
> comm
On Tue, 7 Feb 2017, Christoph Lameter wrote:
> On Tue, 7 Feb 2017, Michal Hocko wrote:
>
> > I am always nervous when seeing hotplug locks being used in low level
> > code. It has bitten us several times already and those deadlocks are
> > quite hard to spot when reviewing the code and very rare t
On Tue 07-02-17 12:03:19, Tejun Heo wrote:
> Hello,
>
> Sorry about the delay.
>
> On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c3358d4f7932..b6411816787a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @
Hello,
Sorry about the delay.
On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c3358d4f7932..b6411816787a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone)
On Tue, 7 Feb 2017, Michal Hocko wrote:
> I am always nervous when seeing hotplug locks being used in low level
> code. It has bitten us several times already and those deadlocks are
> quite hard to spot when reviewing the code and very rare to hit so they
> tend to live for a long time.
Yep. Hot
On Tue 07-02-17 16:22:24, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote:
> > > But we do not care about the whole cpu hotplug code. The only part we
> > > really do care about is the race inside drain_pages_zone and that will
> > > run in an atomic context on the
On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote:
> > But we do not care about the whole cpu hotplug code. The only part we
> > really do care about is the race inside drain_pages_zone and that will
> > run in an atomic context on the specific CPU.
> >
> > You are absolutely right that
On Tue 07-02-17 15:19:11, Michal Hocko wrote:
> On Tue 07-02-17 13:58:46, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
> [...]
> > > Anyway, shouldn't be it sufficient to disable preemption
> > > on drain_local_pages_wq?
> >
> > That would be sufficient for a
On Tue 07-02-17 13:58:46, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
[...]
> > Anyway, shouldn't be it sufficient to disable preemption
> > on drain_local_pages_wq?
>
> That would be sufficient for a hot-removed CPU moving the drain request
> to another CPU a
On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
> > You cannot put sleepable lock inside the preempt disbaled section...
> > We can make it a spinlock right?
>
> Scratch that! For some reason I thought that cpu notifiers are run in an
> atomic context. Now that I am checking the code
On 02/07/2017 01:48 PM, Michal Hocko wrote:
On Tue 07-02-17 13:43:39, Vlastimil Babka wrote:
[...]
> Anyway, shouldn't be it sufficient to disable preemption
> on drain_local_pages_wq? The CPU hotplug callback will not preempt us
> and so we cannot work on the same cpus, right?
I thought the pr
On Tue 07-02-17 13:03:50, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 12:43:27PM +0100, Michal Hocko wrote:
> > > Right. The unbind operation can set a mask that is any allowable CPU and
> > > the final process_work is not done in a context that prevents
> > > preemption.
> > >
> > > diff --git a/
On Tue, Feb 07, 2017 at 12:43:27PM +0100, Michal Hocko wrote:
> > Right. The unbind operation can set a mask that is any allowable CPU and
> > the final process_work is not done in a context that prevents
> > preemption.
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3b93879990fd
On Tue 07-02-17 13:43:39, Vlastimil Babka wrote:
[...]
> > Anyway, shouldn't be it sufficient to disable preemption
> > on drain_local_pages_wq? The CPU hotplug callback will not preempt us
> > and so we cannot work on the same cpus, right?
>
> I thought the problem here was that the callback race
On 02/07/2017 01:37 PM, Michal Hocko wrote:
> @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu)
> {
>
>lru_add_drain_cpu(cpu);
> +
> + /*
> + * A per-cpu drain via a workqueue from drain_all_pages can be
> + * rescheduled onto an unrelated CPU. That allows the hotp
On Tue 07-02-17 12:43:27, Michal Hocko wrote:
> On Tue 07-02-17 11:34:35, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote:
> > > On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> > > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > > > > On 02/07
On Tue 07-02-17 12:54:48, Vlastimil Babka wrote:
> On 02/07/2017 12:43 PM, Michal Hocko wrote:
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3b93879990fd..7af165d308c4 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2342,7 +2342,14 @@ void drain_local_pag
On 02/07/2017 12:43 PM, Michal Hocko wrote:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b93879990fd..7af165d308c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone)
static void drain_local_pages_wq(struct work_struct *wo
On Tue 07-02-17 11:34:35, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote:
> > On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > > > On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > > > > If I'm readi
On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote:
> On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > > On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > > > If I'm reading this right, a hot-remove will set the pool
> > > >
On 2017/02/07 7:05, Mel Gorman wrote:
> On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
>> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote:
>>> On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote:
On 29.1.2017 13:44, Dmitry Vyukov wrote:
> Hello,
>
> I've g
On Tue, Feb 07, 2017 at 10:42:49AM +, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote:
> > > cpu offlining. I have to check the code but my impression was that WQ
> > > code will ignore the cpu requested by the work item when the cpu is
> > > going offline. I
On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote:
> > cpu offlining. I have to check the code but my impression was that WQ
> > code will ignore the cpu requested by the work item when the cpu is
> > going offline. If the offline happens while the worker function already
> > executes
On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > > If I'm reading this right, a hot-remove will set the pool
> > > POOL_DISASSOCIATED
> > > and unbound. A workqueue queued for draining g
On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
> > and unbound. A workqueue queued for draining get migrated during hot-remove
> > and a drain operation wil
On Tue 07-02-17 10:49:28, Vlastimil Babka wrote:
> On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
> > and unbound. A workqueue queued for draining get migrated during hot-remove
> > and a drain operation will execute twice
On Tue 07-02-17 10:23:31, Vlastimil Babka wrote:
> On 02/07/2017 09:48 AM, Michal Hocko wrote:
> > On Mon 06-02-17 22:05:30, Mel Gorman wrote:
> >>> Unfortunately it does not seem to help.
> >>
> >> I'm a little stuck on how to best handle this. get_online_cpus() can
> >> halt forever if the hotplu
On 02/07/2017 10:43 AM, Mel Gorman wrote:
> If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
> and unbound. A workqueue queued for draining get migrated during hot-remove
> and a drain operation will execute twice on a CPU -- one for what was
> queued and a second time f
On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote:
>
> > cpu offlining. I have to check the code but my impression was that WQ
> > code will ignore the cpu requested by the work item when the cpu is
> > going offline. If the offline happens while the worker function already
> > execu
On Tue, Feb 07, 2017 at 09:48:56AM +0100, Michal Hocko wrote:
> > +
> > + /*
> > +* Only drain from contexts allocating for user allocations.
> > +* Kernel allocations could be holding a CPU hotplug-related
> > +* mutex, particularly hot-add allocating
On 02/07/2017 09:48 AM, Michal Hocko wrote:
> On Mon 06-02-17 22:05:30, Mel Gorman wrote:
>>> Unfortunately it does not seem to help.
>>
>> I'm a little stuck on how to best handle this. get_online_cpus() can
>> halt forever if the hotplug operation is holding the mutex when calling
>> pcpu_alloc.
On Mon 06-02-17 22:05:30, Mel Gorman wrote:
> On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
> > On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote:
> > > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote:
> > >> On 29.1.2017 13:44, Dmitry Vyukov wrote:
> > >>> Hello,
> > >
On Mon 06-02-17 20:13:35, Dmitry Vyukov wrote:
[...]
> Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of
> mmotm/auto-latest
> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git):
>
> commit 510948533b059f4f5033464f9f4a0c32d4ab0c08
> Date: Thu Feb 2 10:08:47 2017 +0100
>
On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote:
> > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote:
> >> On 29.1.2017 13:44, Dmitry Vyukov wrote:
> >>> Hello,
> >>>
> >>> I've got the following deadlock report while ru
On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote:
> On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote:
>> On 29.1.2017 13:44, Dmitry Vyukov wrote:
>>> Hello,
>>>
>>> I've got the following deadlock report while running syzkaller fuzzer
>>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
>>>
On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote:
> On 29.1.2017 13:44, Dmitry Vyukov wrote:
>> Hello,
>>
>> I've got the following deadlock report while running syzkaller fuzzer
>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
>>
>> [ INFO: possible circular locking dependency detected ]
>>
On 29.1.2017 13:44, Dmitry Vyukov wrote:
> Hello,
>
> I've got the following deadlock report while running syzkaller fuzzer
> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
>
> [ INFO: possible circular locking dependency detected ]
> 4.10.0-rc5-next-20170125 #1 Not tainted
> --
Hello,
I've got the following deadlock report while running syzkaller fuzzer
on f37208bc3c9c2f811460ef264909dfbc7f605a60:
[ INFO: possible circular locking dependency detected ]
4.10.0-rc5-next-20170125 #1 Not tainted
---
syz-executor3/14255 is
60 matches
Mail list logo