If a completion occurs after blk_mq_rq_timed_out() has reset
rq->aborted_gstate and the request is again in flight when the timeout
expires then a request will be completed twice: a first time by the
timeout handler and a second time when the regular completion occurs.
Additionally, the blk-mq
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.
Is the below patch sufficient to fix this?
Thanks,
Bart.
Subject: blk-mq: Avoid that submitting a bio concurrently with device removal
triggers a crash
On Sun, 2018-04-08 at 12:08 -0700, Matthew Wilcox wrote:
> On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote:
> > Do you perhaps want me to prepare a patch that makes blk_get_request() again
> > respect the full gfp mask passed as third argument to blk_get_request()?
>
> I think
On Mon, Apr 09, 2018 at 09:33:08AM +0800, Joseph Qi wrote:
> Hi Bart,
>
> On 18/4/8 22:50, Bart Van Assche wrote:
> > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> >> The following kernel oops is triggered by 'removing scsi device' during
> >> heavy IO.
> >
> > How did you trigger this
On Sun, Apr 08, 2018 at 04:35:59PM +0300, Sagi Grimberg wrote:
>
>
> On 04/08/2018 03:57 PM, Ming Lei wrote:
> > On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote:
> > >
> > > > > > > > > Hi Sagi
> > > > > > > > >
> > > > > > > > > Still can reproduce this issue with the change:
>
On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert wrote:
>
> On 2018-04-06 02:42 AM, Christoph Hellwig wrote:
>>
>> On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:
>>>
>>> Ah. Far better.
>>> What about delegating FORMAT UNIT to the control LUN, and not
>>>
Hi Bart,
On 18/4/8 22:50, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
>> The following kernel oops is triggered by 'removing scsi device' during
>> heavy IO.
>
> How did you trigger this oops?
>
I can reproduce this oops by the following steps:
1) start a fio
Hi.
Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2.
Also, duplicating the initial statement for them.
With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist
warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ
enabled.
On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote:
> __GFP_KSWAPD_RECLAIM wasn't stripped off on purpose for non-atomic
> allocations. That was an oversight.
OK, good.
> Do you perhaps want me to prepare a patch that makes blk_get_request() again
> respect the full gfp mask passed
On Mon, 30 Oct 2017 12:32:58 -0700, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on
> 36ef71cae353f88fd6e095e2aaa3e5953af1685d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console
On Sat, 2018-04-07 at 23:54 -0700, Matthew Wilcox wrote:
> Please explain:
>
> commit 6a15674d1e90917f1723a814e2e8c949000440f7
> Author: Bart Van Assche
> Date: Thu Nov 9 10:49:54 2017 -0800
>
> block: Introduce blk_get_request_flags()
>
> A side effect of
Wakko Warner wrote:
> Bart Van Assche wrote:
> > Have you tried to modify the kernel Makefile as indicated in the following
> > e-mail? This should make the kernel build:
> >
> > https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html
>
> Thanks. That helped.
>
> I finished with git
Bart Van Assche wrote:
> On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is
> > > > null.
> > > > I added a dev_printk in
On Sun, 2018-04-08 at 16:11 +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we cannot
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.
How did you trigger this oops?
Bart.
On 04/08/2018 03:57 PM, Ming Lei wrote:
On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote:
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index
On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote:
>
> > > > > > > Hi Sagi
> > > > > > >
> > > > > > > Still can reproduce this issue with the change:
> > > > > >
> > > > > > Thanks for validating Yi,
> > > > > >
> > > > > > Would it be possible to test the following:
> > > > > >
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
On Sun, Apr 08, 2018 at 01:58:49PM +0300, Sagi Grimberg wrote:
>
> > > > > Hi Sagi
> > > > >
> > > > > Still can reproduce this issue with the change:
> > > >
> > > > Thanks for validating Yi,
> > > >
> > > > Would it be possible to test the following:
> > > > --
> > > > diff --git
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
On Sun, Apr 08, 2018 at 06:44:33PM +0800, Ming Lei wrote:
> On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote:
> >
> > > Hi Sagi
> > >
> > > Still can reproduce this issue with the change:
> >
> > Thanks for validating Yi,
> >
> > Would it be possible to test the following:
> > --
On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote:
>
> > Hi Sagi
> >
> > Still can reproduce this issue with the change:
>
> Thanks for validating Yi,
>
> Would it be possible to test the following:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
On Sun, Apr 08, 2018 at 05:25:42PM +0800, Ming Lei wrote:
> On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> > This is because scsi_remove_device() will call blk_cleanup_queue(), and
> > then all blkgs have been destroyed and root_blkg is NULL.
> > Thus tg is NULL and trigger NULL
Firstly, from commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU),
blk-mq doesn't remap queue any more after CPU topo is changed.
Secondly, set->nr_hw_queues can't be bigger than nr_cpu_ids, and now we map
all possible CPUs to hw queues, so at least one CPU is mapped to each hctx.
So
There are several reasons for removing the check:
1) blk_mq_hw_queue_mapped() returns true always now since each hctx
may be mapped by one CPU at least
2) when there isn't any online CPU mapped to this hctx, there won't
be any IO queued to this CPU, blk_mq_run_hw_queue() only runs queue
if there
Now the actual meaning of queue mapped is that if there is any online
CPU mapped to this hctx, so implement blk_mq_hw_queue_mapped() in this
way.
Cc: Christian Borntraeger
Cc: Christoph Hellwig
Cc: Stefan Haberland
Signed-off-by:
This patch introduces helper of blk_mq_hw_queue_first_cpu() for
figuring out the hctx's first cpu, and code duplication can be
avoided.
Cc: Christian Borntraeger
Cc: Christoph Hellwig
Cc: Stefan Haberland
Signed-off-by: Ming Lei
This patch figures out the final selected CPU, then writes
it to hctx->next_cpu once, then we can avoid to intermediate
next cpu observed from other dispatch paths.
Cc: Christian Borntraeger
Cc: Christoph Hellwig
Cc: Stefan Haberland
>From commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU),
blk-mq doesn't remap queue after CPU topo is changed, that said when
some of these offline CPUs become online, they are still mapped to
hctx 0, then hctx 0 may become the bottleneck of IO dispatch and
completion.
This patch
>From commit 20e4d81393196 (blk-mq: simplify queue mapping & schedule
with each possisble CPU), one hctx can be mapped from all offline CPUs,
then hctx->next_cpu can be set as wrong.
This patch fixes this issue by making hctx->next_cpu pointing to the
first CPU in hctx->cpumask if all CPUs in
Hi Jens,
The first two patches fix issues about queue mapping.
The other 6 patches improve queue mapping for blk-mq.
Christian, this patches should fix your issue, so please give
a test, and the patches can be found in the following tree:
On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we
Signed-off-by: Joseph Qi
---
block/blk-cgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1c16694..87367d4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -640,7 +640,7 @@ int
This is because scsi_remove_device() will call blk_cleanup_queue(), and
then all blkgs have been destroyed and root_blkg is NULL.
Thus tg is NULL and trigger NULL pointer dereference when get td from
tg (tg->td).
It seems that we cannot simply move blkcg_exit_queue() up to
blk_cleanup_queue().
Please explain:
commit 6a15674d1e90917f1723a814e2e8c949000440f7
Author: Bart Van Assche
Date: Thu Nov 9 10:49:54 2017 -0800
block: Introduce blk_get_request_flags()
A side effect of this patch is that the GFP mask that is passed to
several allocation
36 matches
Mail list logo