On 12/21/22 06:05, Gulam Mohamed wrote:
Change the data type of start and end time IO accounting variables in,
block layer, from "unsigned long" to "u64". This is to enable nano-seconds
granularity, in next commit, for the devices whose latency is less than
milliseconds.
Changes from V2 to
Reviewed-by: Sagi Grimberg
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Well, when it will failover, it will probably be directed to the poll
queues. Maybe I'm missing something...
In this patchset, because it isn't submitted directly from FS, there
isn't one polling context associated with this bio, so its HIPRI flag
will be cleared, then fallback to irq mode.
+static void blk_bio_poll_post_submit(struct bio *bio, blk_qc_t cookie)
+{
+ bio->bi_iter.bi_private_data = cookie;
+}
+
Hey Ming, thinking about nvme-mpath, I'm thinking that this should be
an exported function for failover. nvme-mpath updates bio.bi_dev
when re-submitting I/Os to an
+static void blk_bio_poll_post_submit(struct bio *bio, blk_qc_t cookie)
+{
+ bio->bi_iter.bi_private_data = cookie;
+}
+
Hey Ming, thinking about nvme-mpath, I'm thinking that this should be
an exported function for failover. nvme-mpath updates bio.bi_dev
when re-submitting I/Os to an
[ .. ]
Originally nvme multipath would update/change the size of the multipath
device according to the underlying path devices.
With this patch the size of the multipath device will _not_ change if
there
is a change on the underlying devices.
Yes, it will. Take a close look at
Reviewed-by: Sagi Grimberg
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Reviewed-by: Sagi Grimberg
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Reviewed-by: Sagi Grimberg
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
+ switch (nvme_req_disposition(req)) {
+ case COMPLETE:
+ nvme_complete_req(req);
nvme_complete_rq calling nvme_complete_req... Maybe call it
__nvme_complete_rq instead?
That's what I had first, but it felt so strangely out of place next
to the other nvme_*_req
+static inline enum nvme_disposition nvme_req_disposition(struct request *req)
+{
+ if (likely(nvme_req(req)->status == 0))
+ return COMPLETE;
+
+ if (blk_noretry_request(req) ||
+ (nvme_req(req)->status & NVME_SC_DNR) ||
+ nvme_req(req)->retries
Hey Mike,
The point is: blk_path_error() has nothing to do with NVMe errors.
This is dm-multipath logic stuck in the middle of the NVMe error
handling code.
No, it is a means to have multiple subsystems (to this point both SCSI
and NVMe) doing the correct thing when translating subsystem
Hey Mike,
The point is: blk_path_error() has nothing to do with NVMe errors.
This is dm-multipath logic stuck in the middle of the NVMe error
handling code.
No, it is a means to have multiple subsystems (to this point both SCSI
and NVMe) doing the correct thing when translating subsystem
V14:
- drop patch(patch 4 in V13) for renaming bvec helpers, as suggested by
Jens
- use mp_bvec_* as multi-page bvec helper name
- fix one build issue, which is caused by missing one converion of
bio_for_each_segment_all in fs/gfs2
- fix one 32bit ARCH
Wait, I see that the bvec is still a single array per bio. When you said
a table I thought you meant a 2-dimentional array...
I mean a new 1-d table A has to be created for multiple bios in one rq,
and build it in the following way
rq_for_each_bvec(tmp, rq, rq_iter)
Yeah, that is the most common example, given merge is enabled
in most of cases. If the driver or device doesn't care merge,
you can disable it and always get single bio request, then the
bio's bvec table can be reused for send().
Does bvec_iter span bvecs with your patches? I didn't see that
I would like to avoid growing bvec tables and keep everything
preallocated. Plus, a bvec_iter operates on a bvec which means
we'll need a table there as well... Not liking it so far...
In case of bios in one request, we can't know how many bvecs there
are except for calling rq_bvecs(), so it
Not sure I understand the 'blocking' problem in this case.
We can build a bvec table from this req, and send them all
in send(),
I would like to avoid growing bvec tables and keep everything
preallocated. Plus, a bvec_iter operates on a bvec which means
we'll need a table there as well...
The only user in your final tree seems to be the loop driver, and
even that one only uses the helper for read/write bios.
I think something like this would be much simpler in the end:
The recently submitted nvme-tcp host driver should also be a user
of this. Does it make sense to keep it as
The only user in your final tree seems to be the loop driver, and
even that one only uses the helper for read/write bios.
I think something like this would be much simpler in the end:
The recently submitted nvme-tcp host driver should also be a user
of this. Does it make sense to keep it as
But interestingly, with my "mptest" link failure test
(test_01_nvme_offline) I'm not actually seeing NVMe trigger a failure
that needs a multipath layer (be it NVMe multipath or DM multipath) to
fail a path and retry the IO. The pattern is that the link goes down,
and nvme waits for it to come
I'm fine with the path selectors getting moved out; maybe it'll
encourage new path selectors to be developed.
But there will need to be some userspace interface stood up to support
your native NVMe multipathing (you may not think it needed but think in
time there will be a need to configure
On 01/07/16 01:52, Mike Snitzer wrote:
On Thu, Jun 30 2016 at 5:57pm -0400,
Ming Lin wrote:
On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote:
Hi Mike,
I'm trying to test NVMeoF multi-path.
root@host:~# lsmod |grep dm_multipath
dm_multipath 24576 0
The perf report is very similar to the one that started this effort..
I'm afraid we'll need to resolve the per-target m->lock in order
to scale with NUMA...
Could be. Just for testing, you can try the 2 topmost commits I've put
here (once applied both __multipath_map and multipath_busy
Hello Sagi,
Hey Bart,
Did you run your test on a NUMA system ?
I did.
If so, can you check with e.g.
perf record -ags -e LLC-load-misses sleep 10 && perf report whether this
workload triggers perhaps lock contention ? What you need to look for in
the perf output is whether any functions
If so, can you check with e.g.
perf record -ags -e LLC-load-misses sleep 10 && perf report whether this
workload triggers perhaps lock contention ? What you need to look for in
the perf output is whether any functions occupy more than 10% CPU time.
I will, thanks for the tip!
The perf
Hi Mike,
So I gave your patches a go (dm-4.6) but I still don't see the
improvement you reported (while I do see a minor improvement).
null_blk queue_mode=2 submit_queues=24
dm_mod blk_mq_nr_hw_queues=24 blk_mq_queue_depth=4096 use_blk_mq=Y
I see 620K IOPs on dm_mq vs. 1750K IOPs on raw
27 matches
Mail list logo