On Tue, Dec 6, 2016 at 12:16 AM, Peter Zijlstra wrote:
>>
>> Of course, I'm really hoping that this shmem.c use is the _only_ such
>> case. But I doubt it.
>
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28
Hmm. Most of them seem to be ok, because they use
On Tue, Dec 6, 2016 at 12:16 AM, Peter Zijlstra wrote:
>>
>> Of course, I'm really hoping that this shmem.c use is the _only_ such
>> case. But I doubt it.
>
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28
Hmm. Most of them seem to be ok, because they use "wait_event()",
which will
On 5 December 2016 at 22:33, Vegard Nossum wrote:
> On 5 December 2016 at 21:35, Linus Torvalds
> wrote:
>> Note for Ingo and Peter: this patch has not been tested at all. But
>> Vegard did test an earlier patch of mine that just verified
On 5 December 2016 at 22:33, Vegard Nossum wrote:
> On 5 December 2016 at 21:35, Linus Torvalds
> wrote:
>> Note for Ingo and Peter: this patch has not been tested at all. But
>> Vegard did test an earlier patch of mine that just verified that yes,
>> the issue really was that wait queue entries
* Peter Zijlstra wrote:
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28
This debug facility looks sensible. A couple of minor suggestions:
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -39,6 +39,9 @@ struct wait_bit_queue {
> struct
* Peter Zijlstra wrote:
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28
This debug facility looks sensible. A couple of minor suggestions:
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -39,6 +39,9 @@ struct wait_bit_queue {
> struct __wait_queue_head {
>
On Mon, Dec 05, 2016 at 12:35:52PM -0800, Linus Torvalds wrote:
> Adding the scheduler people to the participants list, and re-attaching
> the patch, because while this patch is internal to the VM code, the
> issue itself is not.
>
> There might well be other cases where somebody goes
On Mon, Dec 05, 2016 at 12:35:52PM -0800, Linus Torvalds wrote:
> Adding the scheduler people to the participants list, and re-attaching
> the patch, because while this patch is internal to the VM code, the
> issue itself is not.
>
> There might well be other cases where somebody goes
On 5 December 2016 at 21:35, Linus Torvalds
wrote:
> Note for Ingo and Peter: this patch has not been tested at all. But
> Vegard did test an earlier patch of mine that just verified that yes,
> the issue really was that wait queue entries remained on the wait
>
On 5 December 2016 at 21:35, Linus Torvalds
wrote:
> Note for Ingo and Peter: this patch has not been tested at all. But
> Vegard did test an earlier patch of mine that just verified that yes,
> the issue really was that wait queue entries remained on the wait
> queue head just as we were about
Adding the scheduler people to the participants list, and re-attaching
the patch, because while this patch is internal to the VM code, the
issue itself is not.
There might well be other cases where somebody goes "wake_up_all()"
will wake everybody up, so I can put the wait queue head on the
Adding the scheduler people to the participants list, and re-attaching
the patch, because while this patch is internal to the VM code, the
issue itself is not.
There might well be other cases where somebody goes "wake_up_all()"
will wake everybody up, so I can put the wait queue head on the
On 5 December 2016 at 20:11, Vegard Nossum wrote:
> On 5 December 2016 at 18:55, Linus Torvalds
> wrote:
>> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum
>> wrote:
>> Since you apparently can recreate this fairly
On 5 December 2016 at 20:11, Vegard Nossum wrote:
> On 5 December 2016 at 18:55, Linus Torvalds
> wrote:
>> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum
>> wrote:
>> Since you apparently can recreate this fairly easily, how about trying
>> this stupid patch?
>>
>> NOTE! This is entirely
On Mon, Dec 5, 2016 at 11:11 AM, Vegard Nossum wrote:
>
> [ cut here ]
> WARNING: CPU: 22 PID: 14012 at mm/shmem.c:2668 shmem_fallocate+0x9a7/0xac0
Ok, good. So that's confirmed as the cause of this problem.
And the call chain that I wanted is
On Mon, Dec 5, 2016 at 11:11 AM, Vegard Nossum wrote:
>
> [ cut here ]
> WARNING: CPU: 22 PID: 14012 at mm/shmem.c:2668 shmem_fallocate+0x9a7/0xac0
Ok, good. So that's confirmed as the cause of this problem.
And the call chain that I wanted is obviously completely
On 5 December 2016 at 18:55, Linus Torvalds
wrote:
> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum wrote:
>>
>> The warning shows that it made it past the list_empty_careful() check
>> in finish_wait() but then bugs out on the >task_list
>>
On 5 December 2016 at 18:55, Linus Torvalds
wrote:
> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum wrote:
>>
>> The warning shows that it made it past the list_empty_careful() check
>> in finish_wait() but then bugs out on the >task_list
>> dereference.
>>
>> Anything stick out?
>
> I hate that
On 5 December 2016 at 19:11, Andy Lutomirski wrote:
> On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum wrote:
>> On 23 November 2016 at 20:58, Dave Jones wrote:
>>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>>
On 5 December 2016 at 19:11, Andy Lutomirski wrote:
> On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum wrote:
>> On 23 November 2016 at 20:58, Dave Jones wrote:
>>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>>
>>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8
On Mon, Dec 5, 2016 at 10:11 AM, Andy Lutomirski wrote:
>
> So your kernel has been smp-alternatived. That 3e comes from
> alternatives_smp_unlock. If you're running on SMP with UP
> alternatives, things will break.
I'm assuming he's just running in a VM with a single CPU.
On Mon, Dec 5, 2016 at 10:11 AM, Andy Lutomirski wrote:
>
> So your kernel has been smp-alternatived. That 3e comes from
> alternatives_smp_unlock. If you're running on SMP with UP
> alternatives, things will break.
I'm assuming he's just running in a VM with a single CPU.
The problem that I
On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum wrote:
> On 23 November 2016 at 20:58, Dave Jones wrote:
>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>
>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum wrote:
> On 23 November 2016 at 20:58, Dave Jones wrote:
>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>
>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
>> > trace from just before this happened. Does
On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum wrote:
>
> The warning shows that it made it past the list_empty_careful() check
> in finish_wait() but then bugs out on the >task_list
> dereference.
>
> Anything stick out?
I hate that shmem waitqueue garbage. It's really
On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum wrote:
>
> The warning shows that it made it past the list_empty_careful() check
> in finish_wait() but then bugs out on the >task_list
> dereference.
>
> Anything stick out?
I hate that shmem waitqueue garbage. It's really subtle.
I think the
On Mon, Dec 05, 2016 at 06:09:29PM +0100, Vegard Nossum wrote:
> On 5 December 2016 at 12:10, Vegard Nossum wrote:
> > On 5 December 2016 at 00:04, Vegard Nossum wrote:
> >> FWIW I hit this as well:
> >>
> >> BUG: unable to handle kernel
On Mon, Dec 05, 2016 at 06:09:29PM +0100, Vegard Nossum wrote:
> On 5 December 2016 at 12:10, Vegard Nossum wrote:
> > On 5 December 2016 at 00:04, Vegard Nossum wrote:
> >> FWIW I hit this as well:
> >>
> >> BUG: unable to handle kernel paging request at 81ff08b7
> >> IP: []
On 5 December 2016 at 12:10, Vegard Nossum wrote:
> On 5 December 2016 at 00:04, Vegard Nossum wrote:
>> FWIW I hit this as well:
>>
>> BUG: unable to handle kernel paging request at 81ff08b7
>> IP: [] __lock_acquire.isra.32+0xda/0x1a30
On 5 December 2016 at 12:10, Vegard Nossum wrote:
> On 5 December 2016 at 00:04, Vegard Nossum wrote:
>> FWIW I hit this as well:
>>
>> BUG: unable to handle kernel paging request at 81ff08b7
>> IP: [] __lock_acquire.isra.32+0xda/0x1a30
>> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: G
On 5 December 2016 at 00:04, Vegard Nossum wrote:
> FWIW I hit this as well:
>
> BUG: unable to handle kernel paging request at 81ff08b7
> IP: [] __lock_acquire.isra.32+0xda/0x1a30
> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: GB 4.9.0-rc7+ #217
On 5 December 2016 at 00:04, Vegard Nossum wrote:
> FWIW I hit this as well:
>
> BUG: unable to handle kernel paging request at 81ff08b7
> IP: [] __lock_acquire.isra.32+0xda/0x1a30
> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: GB 4.9.0-rc7+ #217
[...]
> I think you can
On 23 November 2016 at 20:58, Dave Jones wrote:
> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>
> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
> > trace from just before this happened. Does this shed any light ?
> >
> >
On 23 November 2016 at 20:58, Dave Jones wrote:
> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>
> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
> > trace from just before this happened. Does this shed any light ?
> >
> >
On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
> [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
> trace from just before this happened. Does this shed any light ?
>
> https://codemonkey.org.uk/junk/trace.txt
crap, I just noticed the timestamps in the
On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
> [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
> trace from just before this happened. Does this shed any light ?
>
> https://codemonkey.org.uk/junk/trace.txt
crap, I just noticed the timestamps in the
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones
> >wrote:
> >>
> >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> >>
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones
> >wrote:
> >>
> >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> >> page:ea0013838e40 count:0
On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones wrote:
BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 index:0x100c
flags:
On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones wrote:
BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 index:0x100c
flags:
On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones wrote:
>
> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 index:0x100c
> flags: 0x400c(referenced|uptodate)
> page dumped because:
On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones wrote:
>
> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 index:0x100c
> flags: 0x400c(referenced|uptodate)
> page dumped because: non-NULL mapping
Hmm. So this
On Wed, Oct 26, 2016 at 07:47:51PM -0400, Dave Jones wrote:
> On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
>
> > >-hctx->queued++;
> > >-data->hctx = hctx;
> > >-data->ctx = ctx;
> > >+data->hctx = alloc_data.hctx;
> > >+
On Wed, Oct 26, 2016 at 07:47:51PM -0400, Dave Jones wrote:
> On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
>
> > >-hctx->queued++;
> > >-data->hctx = hctx;
> > >-data->ctx = ctx;
> > >+data->hctx = alloc_data.hctx;
> > >+
On Thu, Oct 27, 2016 at 04:41:33PM +1100, Dave Chinner wrote:
> And that's indicative of a delalloc metadata reservation being
> being too small and so we're allocating unreserved blocks.
>
> Different symptoms, same underlying cause, I think.
>
> I see the latter assert from time to
On Thu, Oct 27, 2016 at 04:41:33PM +1100, Dave Chinner wrote:
> And that's indicative of a delalloc metadata reservation being
> being too small and so we're allocating unreserved blocks.
>
> Different symptoms, same underlying cause, I think.
>
> I see the latter assert from time to
On 10/27/2016 10:34 AM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 11:33 PM, Christoph Hellwig wrote:
Dave, can you hit the warnings with this? Totally untested...
Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
Yeah, I found that hard to read too.
On 10/27/2016 10:34 AM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 11:33 PM, Christoph Hellwig wrote:
Dave, can you hit the warnings with this? Totally untested...
Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
Yeah, I found that hard to read too. The difference between
On Wed, Oct 26, 2016 at 11:33 PM, Christoph Hellwig wrote:
>> Dave, can you hit the warnings with this? Totally untested...
>
> Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
Yeah, I found that hard to read too. The difference between
blk_map_ctx and
On Wed, Oct 26, 2016 at 11:33 PM, Christoph Hellwig wrote:
>> Dave, can you hit the warnings with this? Totally untested...
>
> Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
Yeah, I found that hard to read too. The difference between
blk_map_ctx and blk_mq_alloc_data is
On 10/26/2016 08:00 PM, Jens Axboe wrote:
> On 10/26/2016 05:47 PM, Dave Jones wrote:
>> On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
>>
>> > >-hctx->queued++;
>> > >-data->hctx = hctx;
>> > >-data->ctx = ctx;
>> > >+data->hctx = alloc_data.hctx;
>> > >+
On 10/26/2016 08:00 PM, Jens Axboe wrote:
> On 10/26/2016 05:47 PM, Dave Jones wrote:
>> On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
>>
>> > >-hctx->queued++;
>> > >-data->hctx = hctx;
>> > >-data->ctx = ctx;
>> > >+data->hctx = alloc_data.hctx;
>> > >+
> Dave, can you hit the warnings with this? Totally untested...
Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ddc2eed..d74a74a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1190,21 +1190,15 @@ static inline bool
> Dave, can you hit the warnings with this? Totally untested...
Can we just kill off the unhelpful blk_map_ctx structure, e.g.:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ddc2eed..d74a74a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1190,21 +1190,15 @@ static inline bool
On Tue, Oct 25, 2016 at 08:27:52PM -0400, Dave Jones wrote:
> DaveC: Do these look like real problems, or is this more "looks like
> random memory corruption" ? It's been a while since I did some stress
> testing on XFS, so these might not be new..
>
> XFS: Assertion failed: oldlen > newlen,
On Tue, Oct 25, 2016 at 08:27:52PM -0400, Dave Jones wrote:
> DaveC: Do these look like real problems, or is this more "looks like
> random memory corruption" ? It's been a while since I did some stress
> testing on XFS, so these might not be new..
>
> XFS: Assertion failed: oldlen > newlen,
On 10/26/2016 05:47 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
> >- hctx->queued++;
> >- data->hctx = hctx;
> >- data->ctx = ctx;
> >+ data->hctx = alloc_data.hctx;
> >+ data->ctx = alloc_data.ctx;
> >+ data->hctx->queued++;
On 10/26/2016 05:47 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
> >- hctx->queued++;
> >- data->hctx = hctx;
> >- data->ctx = ctx;
> >+ data->hctx = alloc_data.hctx;
> >+ data->ctx = alloc_data.ctx;
> >+ data->hctx->queued++;
On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
> >- hctx->queued++;
> >- data->hctx = hctx;
> >- data->ctx = ctx;
> >+ data->hctx = alloc_data.hctx;
> >+ data->ctx = alloc_data.ctx;
> >+ data->hctx->queued++;
> >return rq;
> > }
>
> This made it through
On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote:
> >- hctx->queued++;
> >- data->hctx = hctx;
> >- data->ctx = ctx;
> >+ data->hctx = alloc_data.hctx;
> >+ data->ctx = alloc_data.ctx;
> >+ data->hctx->queued++;
> >return rq;
> > }
>
> This made it through
On Wed, Oct 26, 2016 at 05:20:01PM -0600, Jens Axboe wrote:
On 10/26/2016 05:08 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
Actually, I think I see what might trigger it. You are on nvme, iirc,
and that has a deep queue.
Yes. I have long since
On Wed, Oct 26, 2016 at 05:20:01PM -0600, Jens Axboe wrote:
On 10/26/2016 05:08 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
Actually, I think I see what might trigger it. You are on nvme, iirc,
and that has a deep queue.
Yes. I have long since moved on from
On 10/26/2016 05:19 PM, Chris Mason wrote:
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that
On 10/26/2016 05:19 PM, Chris Mason wrote:
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
blk_mq_merge_queue_io()
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
blk_mq_merge_queue_io() into two
I
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
blk_mq_merge_queue_io() into two
I did that myself too, since
On 10/26/2016 05:08 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
Actually, I think I see what might trigger it. You are on nvme, iirc,
and that has a deep queue.
Yes. I have long since moved on from slow disks, so all my systems are
not just
On 10/26/2016 05:08 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
Actually, I think I see what might trigger it. You are on nvme, iirc,
and that has a deep queue.
Yes. I have long since moved on from slow disks, so all my systems are
not just flash, but m.2
On Wed, Oct 26, 2016 at 03:07:10PM -0700, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason wrote:
Today I turned off every CONFIG_DEBUG_* except for list debugging, and
ran dbench 2048:
[ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33
On Wed, Oct 26, 2016 at 03:07:10PM -0700, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason wrote:
Today I turned off every CONFIG_DEBUG_* except for list debugging, and
ran dbench 2048:
[ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33
__list_add+0xbe/0xd0
[
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
>
> Actually, I think I see what might trigger it. You are on nvme, iirc,
> and that has a deep queue.
Yes. I have long since moved on from slow disks, so all my systems are
not just flash, but m.2 nvme ssd's.
So at least that
On Wed, Oct 26, 2016 at 4:03 PM, Jens Axboe wrote:
>
> Actually, I think I see what might trigger it. You are on nvme, iirc,
> and that has a deep queue.
Yes. I have long since moved on from slow disks, so all my systems are
not just flash, but m.2 nvme ssd's.
So at least that could explain why
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
> On 10/26/2016 04:58 PM, Linus Torvalds wrote:
> > On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
> > wrote:
> >>
> >> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> >>
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote:
> On 10/26/2016 04:58 PM, Linus Torvalds wrote:
> > On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
> > wrote:
> >>
> >> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> >> blk_mq_merge_queue_io() into two
> >
>
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
blk_mq_merge_queue_io() into two
I did that myself too, since Dave sees this during boot.
But
On 10/26/2016 04:58 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
blk_mq_merge_queue_io() into two
I did that myself too, since Dave sees this during boot.
But I'm not getting the warning ;(
On 10/26/2016 05:01 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote:
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two, since right now it can trigger both
> for the
>
>
On 10/26/2016 05:01 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote:
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two, since right now it can trigger both
> for the
>
>
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote:
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two, since right now it can trigger both
> for the
>
> blk_mq_bio_to_request(rq, bio);
>
> path _and_ for the
>
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote:
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two, since right now it can trigger both
> for the
>
> blk_mq_bio_to_request(rq, bio);
>
> path _and_ for the
>
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
>
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two
I did that myself too, since Dave sees this during boot.
But I'm not getting the warning ;(
Dave gets it
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds
wrote:
>
> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in
> blk_mq_merge_queue_io() into two
I did that myself too, since Dave sees this during boot.
But I'm not getting the warning ;(
Dave gets it with ext4, and thats' what I
On 10/26/2016 04:51 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones wrote:
I gave it a shot too for shits & giggles.
This falls out during boot.
[9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181
blk_sq_make_request+0x465/0x4a0
Hmm.
On 10/26/2016 04:51 PM, Linus Torvalds wrote:
On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones wrote:
I gave it a shot too for shits & giggles.
This falls out during boot.
[9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181
blk_sq_make_request+0x465/0x4a0
Hmm. That's the
On 10/26/2016 04:40 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:
> Could you try the attached patch? It adds a couple of sanity tests:
>
> - a number of tests to verify that 'rq->queuelist' isn't already on
> some queue when it is added to a queue
On 10/26/2016 04:40 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:
> Could you try the attached patch? It adds a couple of sanity tests:
>
> - a number of tests to verify that 'rq->queuelist' isn't already on
> some queue when it is added to a queue
On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones wrote:
>
> I gave it a shot too for shits & giggles.
> This falls out during boot.
>
> [9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181
> blk_sq_make_request+0x465/0x4a0
Hmm. That's the
WARN_ON_ONCE(rq->mq_ctx
On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones wrote:
>
> I gave it a shot too for shits & giggles.
> This falls out during boot.
>
> [9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181
> blk_sq_make_request+0x465/0x4a0
Hmm. That's the
WARN_ON_ONCE(rq->mq_ctx != ctx);
that I added
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:
> Could you try the attached patch? It adds a couple of sanity tests:
>
> - a number of tests to verify that 'rq->queuelist' isn't already on
> some queue when it is added to a queue
>
> - one test to verify that rq->mq_ctx
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:
> Could you try the attached patch? It adds a couple of sanity tests:
>
> - a number of tests to verify that 'rq->queuelist' isn't already on
> some queue when it is added to a queue
>
> - one test to verify that rq->mq_ctx
On Wed, Oct 26, 2016 at 2:52 PM, Chris Mason wrote:
>
> This one is special because CONFIG_VMAP_STACK is not set. Btrfs triggers in
> < 10 minutes.
> I've done 30 minutes each with XFS and Ext4 without luck.
Ok, see the email I wrote that crossed yours - if it's really some
list
On Wed, Oct 26, 2016 at 2:52 PM, Chris Mason wrote:
>
> This one is special because CONFIG_VMAP_STACK is not set. Btrfs triggers in
> < 10 minutes.
> I've done 30 minutes each with XFS and Ext4 without luck.
Ok, see the email I wrote that crossed yours - if it's really some
list corruption on
On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason wrote:
>
> Today I turned off every CONFIG_DEBUG_* except for list debugging, and
> ran dbench 2048:
>
> [ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33
> __list_add+0xbe/0xd0
> [ 2759.119652] list_add corruption.
On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason wrote:
>
> Today I turned off every CONFIG_DEBUG_* except for list debugging, and
> ran dbench 2048:
>
> [ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33
> __list_add+0xbe/0xd0
> [ 2759.119652] list_add corruption. prev->next should be
On 10/26/2016 04:00 PM, Chris Mason wrote:
>
>
> On 10/26/2016 03:06 PM, Linus Torvalds wrote:
>> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>>>
>>> The stacks show nearly all of them are stuck in sync_inodes_sb
>>
>> That's just wb_wait_for_completion(), and
On 10/26/2016 04:00 PM, Chris Mason wrote:
>
>
> On 10/26/2016 03:06 PM, Linus Torvalds wrote:
>> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>>>
>>> The stacks show nearly all of them are stuck in sync_inodes_sb
>>
>> That's just wb_wait_for_completion(), and it means that some IO
On 10/26/2016 03:06 PM, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>>
>> The stacks show nearly all of them are stuck in sync_inodes_sb
>
> That's just wb_wait_for_completion(), and it means that some IO isn't
> completing.
>
> There's
On 10/26/2016 03:06 PM, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>>
>> The stacks show nearly all of them are stuck in sync_inodes_sb
>
> That's just wb_wait_for_completion(), and it means that some IO isn't
> completing.
>
> There's also a lot of processes
On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>
> The stacks show nearly all of them are stuck in sync_inodes_sb
That's just wb_wait_for_completion(), and it means that some IO isn't
completing.
There's also a lot of processes waiting for inode_lock(), and a few
On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote:
>
> The stacks show nearly all of them are stuck in sync_inodes_sb
That's just wb_wait_for_completion(), and it means that some IO isn't
completing.
There's also a lot of processes waiting for inode_lock(), and a few
waiting for
1 - 100 of 218 matches
Mail list logo