On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> We triggered this race when using single queue. I'm not sure if it
> exists in multi-queue.
Regarding the races between modifying the queue_lock pointer and the code that
uses that pointer, I think the following construct in
On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
> Since the first version the following was changed:
>
>- Load-balancing and IO fail-over using multipath features were added.
>- Major parts of the code were rewritten, simplified and overall code
> size was reduced by a quarter.
On Fri, 2018-02-02 at 15:09 +0100, Roman Pen wrote:
> +Entries under /sys/kernel/ibnbd_client/
> +===
> [ ... ]
You will need Greg KH's permission to add new entries directly under
/sys/kernel.
Since I think that it is unlikely that he will give that
worker_thread+0x35/0x380
kthread+0x117/0x130
ret_from_fork+0x24/0x30
Fixes: 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and
generation based scheme")
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Tejun Heo <t...@kernel.org>
---
block/blk-core.c
On 01/31/18 00:23, Ming Lei wrote:
After KPTI is merged, there is extra load introduced to context switch
between user space and kernel space. It is observed on my laptop that one
syscall takes extra ~0.15us[1] compared with 'nopti'.
IO performance is affected too, it is observed that IOPS
On Thu, 2018-02-01 at 09:53 +0800, Joseph Qi wrote:
> I'm afraid the risk may also exist in blk_cleanup_queue, which will
> set queue_lock to to the default internal lock.
>
> spin_lock_irq(lock);
> if (q->queue_lock != >__queue_lock)
> q->queue_lock = >__queue_lock;
>
.
- Dropped blk_alloc_queue_node2() and modified all block drivers that call
blk_alloc_queue_node().
Bart Van Assche (2):
block: Add a third argument to blk_alloc_queue_node()
block: Fix a race between the throttling code and request queue
initialization
block/blk-core.c | 29
() call in the driver into a
blk_alloc_queue_node() call and remove the explicit .queue_lock
initialization. Additionally, initialize the spin lock that will
be used as queue lock earlier if necessary.
Reported-by: Joseph Qi <joseph...@linux.alibaba.com>
Signed-off-by: Bart Van Assche <bart.vanass...@
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Joseph Qi <joseph...@linux.alibaba.com>
Cc: Philipp Reisner <philipp.reis...@linbit.com>
Cc: Ulf Hansson <ulf.hans...@linar
ization. Additionally, initialize the spin lock that will
be used as queue lock earlier if necessary.
Reported-by: Joseph Qi <joseph...@linux.alibaba.com>
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Joseph Qi <joseph...@linux.alib
On Tue, 2018-01-30 at 20:22 -0700, Jens Axboe wrote:
> On 1/30/18 8:21 PM, Bart Van Assche wrote:
> > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
> > > BLK_STS_RESOURCE should always be safe to return, and it should work
> > > the same as STS_DEV_RESOURCE, e
On Tue, 2018-01-30 at 19:21 +0800, Joseph Qi wrote:
> Hi Jens and Folks,
>
> Recently we've gotten a hard LOCKUP issue. After investigating the issue
> we've found a race between blk_init_queue_node and blkcg_print_blkgs.
> The race is described below.
>
> blk_init_queue_node
On Tue, 2018-01-30 at 14:33 -0500, Mike Snitzer wrote:
> On Tue, Jan 30 2018 at 12:52pm -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
>
> > - This patch does not fix any bugs nor makes block drivers easier to
> > read or to implement. So why is t
On 01/30/18 06:24, Mike Snitzer wrote:
+*
+* If driver returns BLK_STS_RESOURCE and SCHED_RESTART
+* bit is set, run queue after a delay to avoid IO stalls
+* that could otherwise occur if the queue is idle.
*/
-
On Tue, 2018-01-30 at 09:07 +0800, Ming Lei wrote:
> On Mon, Jan 29, 2018 at 04:48:31PM +0000, Bart Van Assche wrote:
> > - It is easy to fix this race inside the block layer, namely by using
> > call_rcu() inside the blk_mq_delay_run_hw_queue() implementation to
> >
On 01/19/18 07:24, Jens Axboe wrote:
That's what I thought. So for a low queue depth underlying queue, it's
quite possible that this situation can happen. Two potential solutions
I see:
1) As described earlier in this thread, having a mechanism for being
notified when the scarce resource
On Mon, 2018-01-29 at 17:14 -0500, Mike Snitzer wrote:
> On Mon, Jan 29 2018 at 4:51pm -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
>
> > On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote:
> > > But regardless of which wins the race, the queue w
On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote:
> But regardless of which wins the race, the queue will have been run.
> Which is all we care about right?
Running the queue is not sufficient. With this patch applied it can happen
that the block driver returns BLK_STS_DEV_RESOURCE, that the
On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote:
> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART
> + * bit is set, run queue after 10ms to avoid IO stalls
> + * that could otherwise occur if the queue is idle.
>*/
> -
On Sun, 2018-01-28 at 07:41 +0800, Ming Lei wrote:
> Not mention, the request isn't added to dispatch list yet in .queue_rq(),
> strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in
> .queue_rq(), so the current block layer API can't handle it well enough.
I disagree that
On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote:
> On Sat, Jan 27 2018 at 5:12pm -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
> > - The patch at the start of this thread introduces a regression in the
> > SCSI core, namely a queue stall if a r
On Sat, 2018-01-27 at 23:58 -0500, Mike Snitzer wrote:
> On Sat, Jan 27 2018 at 10:00pm -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
>
> > On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote:
> > > You cannot even be forthcoming about the t
On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote:
> You cannot even be forthcoming about the technical merit of a change you
> authored (commit 6077c2d70) that I'm left to clean up in the face of
> performance bottlenecks it unwittingly introduced? If you were being
> honest: you'd grant
On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote:
> Your contributions do _not_ make up for your inability to work well with
> others. Tiresome doesn't begin to describe these interactions.
>
> Life is too short to continue enduring your bullshit.
>
> But do let us know when you have
On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote:
> Ming let me know that he successfully tested this V3 patch using both
> your test (fio to both mpath and underlying path) and Bart's (02-mq with
> can_queue in guest).
>
> Would be great if you'd review and verify this fix works for you
On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote:
> Thanks for your mentioning, then I can find the following comment in
> srp_queuecommand():
>
> /*
>* The SCSI EH thread is the only context from which
> srp_queuecommand()
>* can get invoked for
On Tue, 2018-01-23 at 11:55 +0100, Johannes Thumshirn wrote:
> Use pr_debug instead of hand craftet macros. This way it is not needed to
> re-compile the kernel to enable bsg debug outputs and it's possible to
> selectively enable specific prints.
Reviewed-by: Bart Van Assche <
On Tue, 2018-01-23 at 04:45 -0800, Joe Perches wrote:
> Perhaps the email subject could be improved to describe
> the new macro and as well, this macro, without a pr_fmt
> define somewhat above it loses the __func__ output too.
Hmm ... I thought that the pr_debug() output can be configured to
On Wed, 2018-01-24 at 00:59 +0800, Ming Lei wrote:
> How is that enough to fix the IO hang when driver returns STS_RESOURCE
> and the queue is idle? If you want to follow previous dm-rq's way of
> call blk_mq_delay_run_hw_queue() in .queue_rq(), the same trick need
> to be applied to other drivers
On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote:
> On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote:
> > My opinion about this patch is as follows:
> > * Changing a blk_mq_delay_run_hw_queue() call followed by return
> > BLK_STS_DEV_RESOURCE into return BLK_
On Wed, 2018-01-24 at 00:49 +0800, Ming Lei wrote:
> On Tue, Jan 23, 2018 at 04:47:11PM +0000, Bart Van Assche wrote:
> > On Wed, 2018-01-24 at 00:41 +0800, Ming Lei wrote:
> > > Could you explain where to call call_rcu()? call_rcu() can't be used in
> > > IO path at al
On Wed, 2018-01-24 at 00:41 +0800, Ming Lei wrote:
> Could you explain where to call call_rcu()? call_rcu() can't be used in
> IO path at all.
Can you explain what makes you think that call_rcu() can't be used in the
I/O path? As you know call_rcu() invokes a function asynchronously. From
On Tue, 2018-01-23 at 10:22 +0100, Mike Snitzer wrote:
> On Thu, Jan 18 2018 at 5:20pm -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
>
> > On Thu, 2018-01-18 at 17:01 -0500, Mike Snitzer wrote:
> > > And yet Laurence cannot reproduce any such lockups
On 01/23/18 08:26, Ming Lei wrote:
On Tue, Jan 23, 2018 at 08:17:02AM -0800, Bart Van Assche wrote:
On 01/22/18 16:57, Ming Lei wrote:
Even though RCU lock is held during dispatch, preemption or interrupt
can happen too, so it is simply wrong to depend on the timing to make
sure
On Wed, 2018-01-24 at 00:16 +0800, Ming Lei wrote:
> @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q,
> struct list_head *list,
>* - Some but not all block drivers stop a queue before
>* returning BLK_STS_RESOURCE. Two exceptions are
On Mon, 2018-01-22 at 16:14 -0700, Jens Axboe wrote:
> On 1/22/18 3:25 PM, Elliott, Robert (Persistent Memory) wrote:
> > fio engines/sg.c fio_sgio_rw_doio() has that pattern:
> >
> > ret = write(f->fd, hdr, sizeof(*hdr));
> > if (ret < 0)
> > return ret;
> > ...
> >
On Mon, 2018-01-22 at 12:23 -0800, Joe Perches wrote:
> On Mon, 2018-01-22 at 08:53 +0100, Johannes Thumshirn wrote:
> > Use pr_debug instead of hand crafted macros. This way it is not needed to
> > re-compile the kernel to enable bsg debug outputs and it's possible to
> > selectively enable
On Sun, 2018-01-21 at 12:39 -0500, David Zarzycki wrote:
> Hi Bart,
>
> I can do [1] and [2] but I won’t be able to provide the command output
> because the hang is almost total. No new commands can run because the
> scheduler is hung. For example, see this backtrace:
>
>
On Mon, 2018-01-22 at 11:35 +0800, Ming Lei wrote:
> DM-MPATH need to allocate request from underlying queue, but when the
> allocation fails, there is no way to make underlying queue's RESTART
> to restart DM's queue.
>
> This patch introduces blk_get_request_notify() for this purpose, and
>
On Mon, 2018-01-22 at 11:35 +0800, Ming Lei wrote:
> @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q,
> struct list_head *list,
>* - Some but not all block drivers stop a queue before
>* returning BLK_STS_RESOURCE. Two exceptions are
On 01/21/18 23:53, Johannes Thumshirn wrote:
- dprintk("map hdr %llx/%u %llx/%u\n", (unsigned long long)
hdr->dout_xferp,
+ pr_debug("map hdr %llx/%u %llx/%u\n",
+(unsigned long long) hdr->dout_xferp,
hdr->dout_xfer_len, (unsigned long long)
On Sun, 2018-01-21 at 11:19 -0500, David Zarzycki wrote:
> How can I help debug this further?
Please apply patch [1] on top of Jens' for-next branch [2], reproduce the
hang and provide the output of the following command:
(cd /sys/kernel/debug/block && find . -type f -exec grep -aH . {} \;)
On Fri, 2018-01-19 at 15:34 +0800, Ming Lei wrote:
> Could you explain a bit when SCSI target replies with BUSY very often?
>
> Inside initiator, we have limited the max per-LUN requests and per-host
> requests already before calling .queue_rq().
That's correct. However, when a SCSI initiator
ret_from_fork+0x1f/0x30
Fixes: e80a0af4759a ("lib/scatterlist: Introduce sgl_alloc() and sgl_free()")
Reported-by: Laurence Oberman <lober...@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Nicholas A. Bellinger <n...@linux-iscsi.org>
Cc: Laurence
The previous patch removed all users of these two functions. Hence
also remove the functions themselves.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk-core.c | 14 --
include/linux/blkdev.h | 2 --
2 files changed, 16 deletions(-)
diff --git a
Hello Jens,
Earlier today it was agreed that a blk_mq_delay_run_hw_queue() call followed
by a blk_mq_run_hw_queue() call should result in an immediate queue run. Hence
this patch series. Please consider this patch series for kernel v4.16.
Thanks,
Bart.
Bart Van Assche (3):
blk-mq: Rename
Make sure that calling blk_mq_run_hw_queue() or
blk_mq_kick_requeue_list() triggers a queue run without delay even
if blk_mq_delay_run_hw_queue() has been called recently and if its
delay has not yet expired.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk-mq
Most blk-mq functions have a name that follows the pattern blk_mq_${action}.
However, the function name blk_mq_request_direct_issue is an exception.
Hence rename this function. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
blo
On Fri, 2018-01-19 at 23:33 +0800, Ming Lei wrote:
> On Fri, Jan 19, 2018 at 03:20:13PM +0000, Bart Van Assche wrote:
> > On Fri, 2018-01-19 at 15:26 +0800, Ming Lei wrote:
> > > Please see queue_delayed_work_on(), hctx->run_work is shared by all
> > > scheduling,
On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote:
> Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
> it should be DM-only which returns STS_RESOURCE so often.
That's wrong at least for SCSI. See also
https://marc.info/?l=linux-block=151578329417076.
Bart.
On Fri, 2018-01-19 at 08:26 +0800, Ming Lei wrote:
> No, this case won't return BLK_STS_RESOURCE.
It's possible that my attempt to reverse engineer the latest blk-mq changes
was wrong. But the queue stall is real and needs to be fixed.
Bart.
On Fri, 2018-01-19 at 08:18 +0800, Ming Lei wrote:
> On Fri, Jan 19, 2018 at 12:14:24AM +0000, Bart Van Assche wrote:
> > On Fri, 2018-01-19 at 08:11 +0800, Ming Lei wrote:
> > > On Thu, Jan 18, 2018 at 08:37:07AM -0800, Bart Van Assche wrote:
> > > > diff --git a/d
On Fri, 2018-01-19 at 08:11 +0800, Ming Lei wrote:
> On Thu, Jan 18, 2018 at 08:37:07AM -0800, Bart Van Assche wrote:
> > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > index f16096af879a..c59c59cfd2a5 100644
> > --- a/drivers/md/dm-rq.c
> > +++ b/drivers/md/d
On Thu, 2018-01-18 at 15:39 -0700, Jens Axboe wrote:
> When you do have a solid test case, please please submit a blktests
> test case for it! This needs to be something we can regularly in
> testing.
Hello Jens,
That sounds like a good idea to me. BTW, I think the reason why so far I
can
On Thu, 2018-01-18 at 17:18 -0500, Laurence Oberman wrote:
> OK, I ran 5 at once of 5 separate mount points.
> I am using 4k block sizes
> Its solid consistent for me. No stalls no gaps.
Hi Laurence,
That's great news and thank you for having shared this information but I think
it should be
On Thu, 2018-01-18 at 17:01 -0500, Mike Snitzer wrote:
> And yet Laurence cannot reproduce any such lockups with your test...
Hmm ... maybe I misunderstood Laurence but I don't think that Laurence has
already succeeded at running an unmodified version of my tests. In one of the
e-mails Laurence
On Thu, 2018-01-18 at 16:23 -0500, Mike Snitzer wrote:
> On Thu, Jan 18 2018 at 3:58P -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
>
> > On Thu, 2018-01-18 at 15:48 -0500, Mike Snitzer wrote:
> > > For Bart's test the underlying scsi-mq driver is w
On Thu, 2018-01-18 at 15:48 -0500, Mike Snitzer wrote:
> For Bart's test the underlying scsi-mq driver is what is regularly
> hitting this case in __blk_mq_try_issue_directly():
>
> if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))
Hello Mike,
That code path is not the code path
On Thu, 2018-01-18 at 13:30 -0500, Mike Snitzer wrote:
> 1%!? Where are you getting that number? Ming has detailed more
> significant performance gains than 1%.. and not just on lpfc (though you
> keep seizing on lpfc because of the low queue_depth of 3).
That's what I derived from the numbers
On Thu, 2018-01-18 at 12:03 -0500, Mike Snitzer wrote:
> On Thu, Jan 18 2018 at 11:50am -0500,
> Bart Van Assche <bart.vanass...@wdc.com> wrote:
> > My comments about the above are as follows:
> > - It can take up to q->rq_timeout jiffies after a .queue_rq()
&g
On Thu, 2018-01-18 at 11:50 -0500, Mike Snitzer wrote:
> The issue you say it was originally intended to fix _should_ be
> addressed with this change:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16=4dd6edd23e7ea971efddc303f9e67eb79e95808e
Hello
On 01/17/18 18:41, Ming Lei wrote:
BLK_STS_RESOURCE can be returned from driver when any resource
is running out of. And the resource may not be related with tags,
such as kmalloc(GFP_ATOMIC), when queue is idle under this kind of
BLK_STS_RESOURCE, restart can't work any more, then IO hang may
is to call blk_mq_delay_run_hw_queue(). Hence
this patch.
Fixes: ec3eaf9a6731 ("dm mpath: don't call blk_mq_delay_run_hw_queue() in case
of BLK_STS_RESOURCE")
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Ming Lei <ming@redhat.com>
---
drivers/md/dm-rq.c | 1 +
1 file cha
On Wed, 2018-01-17 at 18:43 -0500, Laurence Oberman wrote:
> On Wed, 2018-01-17 at 23:31 +0000, Bart Van Assche wrote:
> > On Wed, 2018-01-17 at 11:58 -0500, Mike Snitzer wrote:
> > > On Wed, Jan 17 2018 at 11:50am -0500,
> > > Jens Axboe <ax...@kernel.dk> wrote:
On Wed, 2018-01-17 at 11:58 -0500, Mike Snitzer wrote:
> On Wed, Jan 17 2018 at 11:50am -0500,
> Jens Axboe wrote:
>
> > On 1/17/18 9:25 AM, Mike Snitzer wrote:
> > > Hi Jens,
> > >
> > > Think this finally takes care of it! ;)
> > >
> > > Thanks,
> > > Mike
> > >
> > > Mike
() and also in an
error path of blk_register_queue(): it is not allowed to hold
sysfs_lock around the kobject_del(>kobj) call.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk-sysfs.c | 37 -
1 file changed, 28 insertions(+), 9
changing the scheduler
from sysfs and unregistering a block layer queue.
Bart Van Assche (3):
block: Unexport elv_register_queue() and elv_unregister_queue()
block: Document scheduler modification locking requirements
block: Protect less code with sysfs_lock in blk_{un,}register_queue()
block
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/elevator.c | 8
1 file changed, 8 insertions(+)
diff --git a/block/elevator.c b/block/elevator.c
index 4f00b53cd5fd..e87e9b43aba0 100644
--- a/block/elevator.c
+++ b
These two functions are only called from inside the block layer so
unexport them.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk.h | 3 +++
block/elevator.c | 2 --
include/linux/elevator.h | 2 --
3 files changed, 3 insertions(+), 4 deletions(-)
On Wed, 2018-01-17 at 21:04 +0800, Ming Lei wrote:
> On Wed, Jan 17, 2018 at 10:17 AM, Bart Van Assche
> <bart.vanass...@wdc.com> wrote:
> > Sorry but I think what you wrote is wrong. kobject_del(>kobj) waits
> > until all
> > ongoing sysfs callback methods, in
On Wed, 2018-01-17 at 09:23 +0800, Ming Lei wrote:
> On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche <bart.vanass...@wdc.com>
> wrote:
> > On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
> > > Therefore it seems to me that all queue_att
On Tue, 2018-01-16 at 06:52 -0800, Matthew Wilcox wrote:
> I see the improvements that Facebook have been making to the nbd driver,
> and I think that's a wonderful thing. Maybe the outcome of this topic
> is simply: "Shut up, Matthew, this is good enough".
>
> It's clear that there's an
On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
> Therefore it seems to me that all queue_attr_{show,store} are racey vs
> blk_unregister_queue() removing the 'queue' kobject.
>
> And it was just that __elevator_change() was myopicly fixed to address
> the race whereas a more generic
On Tue, 2018-01-16 at 15:28 -0800, James Bottomley wrote:
> On Tue, 2018-01-16 at 18:23 -0500, Theodore Ts'o wrote:
> > On Tue, Jan 16, 2018 at 06:52:40AM -0800, Matthew Wilcox wrote:
> > >
> > >
> > > I see the improvements that Facebook have been making to the nbd
> > > driver, and I think
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio")
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/bio-integrity.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 23b42e8aa03e..9cfdd6c83b5b 10
nce check, adding
.dispatch_wait to a wait queue and removing the wait queue entry
must all be protected by both the hctx lock and the wait queue
lock.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Omar Sandoval <osan...@fb.com>
C
On Mon, 2018-01-15 at 18:13 -0500, Mike Snitzer wrote:
> On Mon, Jan 15 2018 at 6:10P -0500,
> Mike Snitzer <snit...@redhat.com> wrote:
>
> > On Mon, Jan 15 2018 at 5:51pm -0500,
> > Bart Van Assche <bart.vanass...@wdc.com> wrote:
> >
> > > On Mo
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/elevator.c | 8
1 file changed, 8 insertions(+)
diff --git a/block/elevator.c b/block/elevator.c
index 4f00b53cd5fd..e87e9b43aba0 100644
--- a/block/elevator.c
+++ b
() and also in an
error path of blk_register_queue(): it is not allowed to hold
sysfs_lock around the kobject_del(>kobj) call.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk-sysfs.c | 13 -
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/
Hello Jens,
The three patches in this series are what I came up with after having
analyzed a lockdep complaint triggered by blk_unregister_queue. Please
consider these patches for kernel v4.16.
Thanks,
Bart.
Bart Van Assche (3):
block: Unexport elv_register_queue() and elv_unregister_queue
These two functions are only called from inside the block layer so
unexport them.
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
---
block/blk.h | 3 +++
block/elevator.c | 2 --
include/linux/elevator.h | 2 --
3 files changed, 3 insertions(+), 4 deletions(-)
On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
> sysfs write op calls kernfs_fop_write which takes:
> of->mutex then kn->count#213 (no idea what that is)
> then q->sysfs_lock (via queue_attr_store)
>
> vs
>
> blk_unregister_queue takes:
> q->sysfs_lock then
> kernfs_mutex (via
On Mon, 2018-01-15 at 12:48 -0500, Mike Snitzer wrote:
> Do I need to do something more to enable lockdep aside from set
> CONFIG_LOCKDEP_SUPPORT=y ?
Hello Mike,
I think you also need to set CONFIG_PROVE_LOCKING=y.
Bart.
On Mon, 2018-01-15 at 12:29 -0500, Mike Snitzer wrote:
> So you replied to v5, I emailed a v6 out for the relevant patch. Just
> want to make sure you're testing with either Jens' latest tree or are
> using my v6 that fixed blk_mq_unregister_dev() to require caller holds
> q->sysfs_lock ?
Hello
On Fri, 2018-01-12 at 10:06 -0500, Mike Snitzer wrote:
> I'm submitting this v5 with more feeling now ;)
Hello Mike,
Have these patches been tested with lockdep enabled? The following appeared in
the kernel log when after I started testing Jens' for-next tree of this morning:
On Fri, 2018-01-12 at 19:52 -0500, Mike Snitzer wrote:
> It was 50 ms before it was 100 ms. No real explaination for these
> values other than they seem to make Bart's IB SRP testbed happy?
But that constant was not introduced by me in the dm code. See e.g. the
following commits:
commit
On Fri, 2018-01-12 at 18:17 -0500, Mike Snitzer wrote:
> @@ -1570,7 +1570,10 @@ static int multipath_end_io(struct dm_target *ti,
> struct request *clone,
> if (error && blk_path_error(error)) {
> struct multipath *m = ti->private;
>
> - r = DM_ENDIO_REQUEUE;
> +
On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
>
> Given it is error handling, do we need to prevent the .queuecommand() call
> in scsi_send_eh_cmnd()? Could you share us what the actual issue
> observed is from
On Fri, 2018-01-12 at 15:05 -0700, Jens Axboe wrote:
> On 1/12/18 3:00 PM, Bart Van Assche wrote:
> > On Fri, 2018-01-12 at 14:55 -0700, Jens Axboe wrote:
> > > On 1/12/18 2:52 PM, Bart Van Assche wrote:
> > > > When debugging e.g. the SCSI timeout handler it is i
NULL pointer
dereference").
Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
Cc: Ming Lei <ming@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Hannes Reinecke <h...@suse.com>
Cc: Johannes Thumshirn <jthumsh...@suse.de>
Cc: Martin K. Petersen <
On Fri, 2018-01-12 at 14:07 -0700, Jens Axboe wrote:
> You're really not making it easy for folks to run this :-)
My hope is that the ib_srp and ib_srpt patches will be accepted upstream soon.
As long as these are not upstream, anyone who wants to retrieve these patches
is welcome to clone
On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> Currently, blk-mq timeout path synchronizes against the usual
> issue/completion path using a complex scheme involving atomic
> bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> rules. Unfortunatley, it contains quite a few
On Thu, 2018-01-11 at 14:01 +0800, Ming Lei wrote:
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 86bf502a8e51..fcddf5a62581 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -533,8 +533,20 @@ static int multipath_clone_and_map(struct dm_target *ti,
>
On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote:
> OK, you have the stage: please give me a pointer to your best
> explaination of the several.
Since the previous discussion about this topic occurred more than a month
ago it could take more time to look up an explanation than to explain it
On Fri, 2018-01-12 at 12:40 -0500, Mike Snitzer wrote:
> You've not explained it many times.
That's not correct. I have already several times posted a detailed and easy
to understand explanation
> We cannot get a straight answer from you.
That is a gross and incorrect statement. Please calm
On Fri, 2018-01-12 at 12:18 -0500, Mike Snitzer wrote:
> This is going upstream for 4.16:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.16=5b18cff4baedde77e0d69bd62a13ae78f9488d89
That is really gross. I have explained many times in detail on the dm-devel
list
On Thu, 2018-01-11 at 15:14 -0500, Mike Snitzer wrote:
> -void device_add_disk(struct device *parent, struct gendisk *disk)
> +void device_add_disk_no_queue_reg(struct device *parent, struct gendisk
> *disk)
> {
> dev_t devt;
> int retval;
> @@ -682,7 +682,6 @@ void
a823fb34a8b ("block: fix warning when I/O elevator is changed as
> request_queue is being removed")
> Cc: sta...@vger.kernel.org # 4.14+
> Reported-by: Bart Van Assche <bart.vanass...@wdc.com>
> Signed-off-by: Mike Snitzer <snit...@redhat.com>
> ---
> block/blk-
On Thu, 2018-01-11 at 17:58 -0500, Mike Snitzer wrote:
> The changes are pretty easy to review. This notion that these changes
> are problematic rings very hollow given your lack of actual numbers (or
> some other concerning observation rooted in testing fact) to back up
> your position.
It's
On Thu, 2018-01-11 at 17:07 -0500, Mike Snitzer wrote:
> Bart, if for some reason we regress for some workload you're able to
> more readily test we can deal with it. But I'm too encouraged by Ming's
> performance improvements to hold these changes back any further.
Sorry Mike but I think Ming's
701 - 800 of 2095 matches
Mail list logo