We should be registering the ns_id attribute as default sysfs
attribute groups, otherwise we have a race condition between
the uevent and the attributes appearing in sysfs.
Suggested-by: Bart van Assche
Signed-off-by: Hannes Reinecke
---
drivers/nvme/host/core.c | 21 -
drivers/nv
Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().
Signed-off-by: Martin Wilck
Signed-off-by
Register default sysfs groups during device_add_disk() to avoid a
race condition with udev during startup.
Signed-off-by: Hannes Reinecke
Cc: Minchan Kim
Cc: Nitin Gupta
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
---
drivers/block/zram/zram_drv.c | 28 +++
Use new-style DEVICE_ATTR_RO/DEVICE_ATTR_RW to create the sysfs attributes
and register the disk with default sysfs attribute groups.
Signed-off-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Acked-by: Michael S. Tsirkin
Reviewed-by: Bart Van Assche
---
drivers/block/virtio_blk.c | 68 +++
device_add_disk() doesn't allow to register with default sysfs groups,
which introduces a race with udev as these groups have to be created after
the uevent has been sent.
This patchset updates device_add_disk() to accept a 'groups' argument to
avoid this race and updates the obvious drivers to use
Register default sysfs groups during device_add_disk() to avoid a
race condition with udev during startup.
Signed-off-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Acked-by: Ed L. Cachin
Reviewed-by: Bart Van Assche
---
drivers/block/aoe/aoe.h| 1 -
drivers/block/aoe/aoeblk.c | 21 +
On 9/27/18 4:55 PM, Omar Sandoval wrote:
> From: Omar Sandoval
>
> Hi,
>
> This is my series to improve the heuristics used by Kyber. Patches 1 and
> 2 are preparation. Patch 3 is a minor optimization. Patch 4 is the main
> change, and includes a detailed description of the new heuristics. Patch
On Tue, 2018-09-18 at 17:18 -0700, Omar Sandoval wrote:
> On Tue, Sep 18, 2018 at 05:02:47PM -0700, Bart Van Assche wrote:
> > On 9/18/18 4:24 PM, Omar Sandoval wrote:
> > > On Tue, Sep 18, 2018 at 02:20:59PM -0700, Bart Van Assche wrote:
> > > > Can you have a look at the updated master branch of
From: Omar Sandoval
Hi,
This is my series to improve the heuristics used by Kyber. Patches 1 and
2 are preparation. Patch 3 is a minor optimization. Patch 4 is the main
change, and includes a detailed description of the new heuristics. Patch
5 adds tracepoints for debugging. This is basically th
From: Omar Sandoval
Commit 4bc6339a583c ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.
Signed-off-by: Omar Sa
From: Omar Sandoval
Kyber will need this in a future change if it is built as a module.
Signed-off-by: Omar Sandoval
---
block/blk-stat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/blk-stat.c b/block/blk-stat.c
index 7587b1c3caaf..90561af85a62 100644
--- a/block/blk-stat.c
+++ b
From: Omar Sandoval
When debugging Kyber, it's really useful to know what latencies we've
been having, how the domain depths have been adjusted, and if we've
actually been throttling. Add three tracepoints, kyber_latency,
kyber_adjust, and kyber_throttled, to record that.
Signed-off-by: Omar San
From: Omar Sandoval
Kyber's current heuristics have a few flaws:
- It's based on the mean latency, but p99 latency tends to be more
meaningful to anyone who cares about latency. The mean can also be
skewed by rare outliers that the scheduler can't do anything about.
- The statistics calculat
From: Omar Sandoval
The domain token sbitmaps are currently initialized to the device queue
depth or 256, whichever is larger, and immediately resized to the
maximum depth for that domain (256, 128, or 64 for read, write, and
other, respectively). The sbitmap is never resized larger than that, so
On 9/26/18 6:35 AM, Ilya Dryomov wrote:
> trace_block_unplug() takes true for explicit unplugs and false for
> implicit unplugs. schedule() unplugs are implicit and should be
> reported as timer unplugs. While correct in the legacy code, this has
> been inverted in blk-mq since 4.11.
That's pret
On Wed, Sep 26, 2018 at 02:35:50PM +0200, Ilya Dryomov wrote:
> trace_block_unplug() takes true for explicit unplugs and false for
> implicit unplugs. schedule() unplugs are implicit and should be
> reported as timer unplugs. While correct in the legacy code, this has
> been inverted in blk-mq si
On Fri, 2018-09-21 at 07:48 +0200, Christoph Hellwig wrote:
> Can you resend this with the one easy fixup pointed out? It would
> be good to finally get the race fix merged.
Seconded. I also would like to see these patches being merged upstream.
Bart.
On 2018-09-27 11:12 AM, Keith Busch wrote:
> Reviewed-by: Keith Busch
Thanks for the reviews Keith!
Logan
On Thu, Sep 27, 2018 at 10:54:20AM -0600, Logan Gunthorpe wrote:
> We create a configfs attribute in each nvme-fabrics target port to
> enable p2p memory use. When enabled, the port will only then use the
> p2p memory if a p2p memory device can be found which is behind the
> same switch hierarchy a
On Thu, Sep 27, 2018 at 10:54:16AM -0600, Logan Gunthorpe wrote:
> Register the CMB buffer as p2pmem and use the appropriate allocation
> functions to create and destroy the IO submission queues.
>
> If the CMB supports WDS and RDS, publish it for use as P2P memory
> by other devices.
>
> Kernels
On Thu, Sep 27, 2018 at 10:54:17AM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
>
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which tell
On Thu, Sep 27, 2018 at 10:54:18AM -0600, Logan Gunthorpe wrote:
> Introduce a quirk to use CMB-like memory on older devices that have
> an exposed BAR but do not advertise support for using CMBLOC and
> CMBSIZE.
>
> We'd like to use some of these older cards to test P2P memory.
>
> Signed-off-by
Add helpers to allocate and free the SGL in a struct nvmet_req:
int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
void nvmet_req_free_sgl(struct nvmet_req *req)
This will be expanded in a future patch to implement peer-to-peer
memory DMAs and should be common with all target dri
The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct
addresses when using P2P memory.
Memory mapped in this way does not need to be unmapped and thus if we
provided pci_p2pmem_unmap_sg() it would be empty. This breaks
In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.
To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page i
Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.
Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.
Signed-off-by: Logan Gunthorpe
Acked-by: Bjorn Helgaas
---
Documentation/ABI/
QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory. This will be used by P2P providers and
orchestrators (in subsequent patches) to ensure block devices can
support P2P memory before submitting P2P backed pages to submit_bio().
Signed-off-by: Logan Gunt
Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.
Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been conv
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it
Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.
We'd like to use some of these older cards to test P2P memory.
Signed-off-by: Logan Gunthorpe
Reviewed-by: Sagi Grimberg
---
drivers/nvme/host/nvme.h |
Add a new directory in the driver API guide for PCI specific
documentation.
This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.
Signed-off-by: Logan Gunthorpe
Cc: Jonathan Corbet
Cc: Mauro Carvalho Chehab
Cc: Greg Kroah-Hartman
Cc: Vinod K
Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.
Add new helpers for a
Hi Everyone,
Here is version 6 of the PCI P2PDMA patch set. This version makes
a few minor changes from v6 and is based on v4.19-rc5. A git repo is here:
https://github.com/sbates130272/linux-p2pmem pci-p2p-v7
Now that we have Bjorn's Acks, I'd preferably like to get Jens's Ack for
Patch 7 and t
Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
Add an interface for other subsystems to f
Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.
If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.
Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seein
For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.
With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by:
On 9/27/18 9:57 AM, 国炬方 wrote:
> Yes, Guoju Fang. Thx. :)
OK, I made that change and committed it. Just be sure to use your full
name in the future for signoffs, etc.
--
Jens Axboe
On 9/27/18 9:41 AM, Coly Li wrote:
> From: guoju
This, and the signed-off, should use the full name. I can fix that up,
assuming Guoju Fang is the full name?
--
Jens Axboe
Hi Jens,
Guoju Fang just posts a bug fix to solve a bcache journal deadlock.
This bug probably happens when system memory is in heavy usage,
the deadlock is observed occasionally for a long while.
If it is too late to go into Linux 4.19, I will submit to you in
4.20 merge window, but it will be co
From: guoju
After write SSD completed, bcache schedules journal_write work to
system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
flag. system_wq is also a bound wq, and there may be no idle kworker on
current processor. Creating a new kworker may unfortunately need to
reclai
Possible changes folded into this series.
(1) (I guess) no need to use _nested version.
(2) Use mutex_lock_killable() where possible.
(3) Move fput() to after mutex_unlock().
(4) Don't return 0 upon invalid loop_control_ioctl().
(5) No need to mutex_lock()/mutex_unlock() on each loop
On Thu 27-09-18 20:35:27, Tetsuo Handa wrote:
> On 2018/09/27 20:27, Jan Kara wrote:
> > Hi,
> >
> > On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
> >> syzbot is reporting circular locking dependency between bdev->bd_mutex
> >> and lo->lo_ctl_mutex [1] which is caused by calling blkdev_reread_part
loop_clr_fd() has a weird locking convention that is expects
loop_ctl_mutex held, releases it on success and keeps it on failure.
Untangle the mess by moving locking of loop_ctl_mutex into
loop_clr_fd().
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 49 +-
Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
there is no good reason to keep these two separate and it just
complicates the locking.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 38 ++
1 file changed, 18 insertions(+), 20 deleti
Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_
__loop_release() has a single call site. Fold it there. This is
currently not a huge win but it will make following replacement of
loop_index_mutex more obvious.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 16 +++-
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/
Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_
Push loop_ctl_mutex down to loop_set_status(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 51 +--
1 file changed, 25 insertions(+), 26 deletions(-)
diff
Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to
call loop_reread_partitions() without lo_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/block/loop.c b/dr
The call of __blkdev_reread_part() from loop_reread_partition() happens
only when we need to invalidate partitions from loop_release(). Thus
move a detection for this into loop_clr_fd() and simplify
loop_reread_partition().
This makes loop_reread_partition() safe to use without loop_ctl_mutex
beca
Push loop_ctl_mutex down to loop_change_fd(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 22 +++---
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/block/loop.c b
Push acquisition of lo_ctl_mutex down into individual ioctl handling
branches. This is a preparatory step for pushing the lock down into
individual ioctl handling functions so that they can release the lock as
they need it. We also factor out some simple ioctl handlers that will
not need any specia
Hi,
this patch series fixes oops and possible deadlocks as reported by syzbot [1]
[2]. The second patch in the series (from Tetsuo) fixes the oops, the remaining
patches are cleaning up the locking in the loop driver so that we can in the
end reasonably easily switch to rereading partitions withou
Push loop_ctl_mutex down to loop_get_status() to avoid the unusual
convention that the function gets called with loop_ctl_mutex held and
releases it.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 37 ++---
1 file changed, 10 insertions(+), 27 deletions(-)
di
Move setting of lo_state to Lo_rundown out into the callers. That will
allow us to unlock loop_ctl_mutex while the loop device is protected
from other changes by its special state.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 52 +++-
1 file
From: Tetsuo Handa
syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_ctl_mute
From: Tetsuo Handa
vfs_getattr() needs "struct path" rather than "struct file".
Let's use path_get()/path_put() rather than get_file()/fput().
Signed-off-by: Tetsuo Handa
Reviewed-by: Jan Kara
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 10 +-
1 file changed, 5 insertions(+),
On 2018/09/27 20:27, Jan Kara wrote:
> Hi,
>
> On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
>> syzbot is reporting circular locking dependency between bdev->bd_mutex
>> and lo->lo_ctl_mutex [1] which is caused by calling blkdev_reread_part()
>> with lock held. We need to drop lo->lo_ctl_mutex in
Hi,
On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency between bdev->bd_mutex
> and lo->lo_ctl_mutex [1] which is caused by calling blkdev_reread_part()
> with lock held. We need to drop lo->lo_ctl_mutex in order to fix it.
>
> This patch fixes it by c
59 matches
Mail list logo