On Thu, Mar 01, 2018 at 10:54:17AM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Laurence Oberman [mailto:lober...@redhat.com]
> > Sent: Wednesday, February 28, 2018 9:52 PM
> > To: Ming Lei; Kashyap Desai
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mi
> -Original Message-
> From: Laurence Oberman [mailto:lober...@redhat.com]
> Sent: Wednesday, February 28, 2018 9:52 PM
> To: Ming Lei; Kashyap Desai
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> linux-s...@vger.kernel.org; Hannes Reinecke; Arun Easi; O
On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
> Hi Everyone,
So Oliver (CC) was having issues getting any of that to work for us.
The problem is that acccording to him (I didn't double check the latest
patches) you effectively hotplug the PCIe memory into the system when
creating str
On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
> > Hi Everyone,
>
>
> So Oliver (CC) was having issues getting any of that to work for us.
>
> The problem is that acccording to him (I didn't double check the latest
> p
On Wed, Feb 28, 2018 at 09:35:29AM -0800, Bart Van Assche wrote:
> From: Damien Le Moal
>
> In case of a failed write request (all retries failed) and when using
> libata, the SCSI error handler calls scsi_finish_command(). In the
> case of blk-mq this means that scsi_mq_done() does not get calle
On Fri, 2018-02-16 at 08:39 +0100, Paolo Valente wrote:
> after enabling the listing options in your list, and a few other
> related options, such iblock support, I get this:
>
> $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq
> Unloaded the ib_srpt kernel module
> Unloaded the rdma_rxe kernel modu
Hi Christoph,
thanks for your quick reply.
On 2018/3/1 上午1:48, Christoph Hellwig wrote:
> Hmm. I'd rather just kill off bio_check_eod and move the check
> to blk_partition_remap so that we only have to check once.
>
I think the check should be done twice if the bi_partno is not zero,
one for th
Looks fine,
and we should pick this up for 4.16 independent of the rest, which
I might need a little more review time for.
Reviewed-by: Christoph Hellwig
QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory.
REQ_PCI_P2P is introduced to indicate a particular bio request is
directed to/from PCI P2P memory. A request with this flag is not
accepted unless the corresponding queues have the QUEUE_FLAG_PCI_P2P
f
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch as the RDMA port and all the block devices in use. If
the user enabled it an no devi
Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
A kernel interface is provided so that oth
The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_[un]map_sg() to map the correct
addresses when using P2P memory.
For this, we assume that an SGL passed to these functions contain all
p2p memory or no p2p memory.
Signed-off-by: Logan Gunthor
For peer-to-peer transactions to work the downstream ports in each
switch must not have the ACS flags set. At this time there is no way
to dynamically change the flags and update the corresponding IOMMU
groups so this is done at enumeration time before the the groups are
assigned.
This effectively
Hi Everyone,
Here's v2 of our series to introduce P2P based copy offload to NVMe
fabrics. This version has been rebased onto v4.16-rc3 which already
includes Christoph's devpagemap work the previous version was based
off as well as a couple of the cleanup patches that were in v1.
Additionally, we
Attributes display the total amount of P2P memory, the amount available
and whether it is published or not.
Signed-off-by: Logan Gunthorpe
---
Documentation/ABI/testing/sysfs-bus-pci | 25 +
drivers/pci/p2pdma.c| 50 +
2 files c
In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be
called to map the correct DMA address. To do this, we add a flags
variable and the RDMA_RW_CTX_FLAG_PCI_P2P flag. When the flag is
specified use the appropriate map function.
Signed-off-by: Logan Gunthorpe
---
drivers/infin
Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.
We'd like to use some of these older cards to test P2P memory.
Signed-off-by: Logan Gunthorpe
---
drivers/nvme/host/nvme.h | 7 +++
drivers/nvme/hos
For P2P requests we must use the pci_p2pmem_[un]map_sg() functions
instead of the dma_map_sg functions.
With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-
Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO SQ.
If the CMB supports WDS and RDS, publish it for use as p2p memory
by other devices.
Signed-off-by: Logan Gunthorpe
---
drivers/nvme/host/pci.c | 75 +++
LGTM
On Wed, Feb 28, 2018 at 11:28 AM, Bart Van Assche
wrote:
> Use the blk_queue_flag_{set,clear}() functions instead of open-coding
> these.
>
> Signed-off-by: Bart Van Assche
> Cc: Michael Lyle
> Cc: Kent Overstreet
> Cc: Christoph Hellwig
> Cc: Hannes Reinecke
> Cc: Johannes Thumshirn
>
Use the blk_queue_flag_*() functions instead of open-coding these.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
drivers/block/mtip32xx/mtip32xx.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/dr
Hello Jens,
As you probably know there is considerable confusion in the block layer core
and in block drivers about how to protect queue flag changes against
concurrent modifications. Some code protects these changes with the queue
lock, other code uses atomic operations and some code does not pro
Since the queue flags may be changed concurrently from multiple
contexts after a queue becomes visible in sysfs, make these changes
safe by protecting these with the queue lock.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
b
Move the definition of queue_flag_clear_unlocked() up and move the
definition of queue_in_flight() down such that all queue flag
manipulation function definitions become contiguous.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Rei
Except for changing the atomic queue flag manipulations that are
protected by the queue lock into non-atomic manipulations, this
patch does not change any functionality.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
block/blk
Introduce functions that modify the queue flags and that protect
these modifications with the request queue lock. Except for moving
one wake_up_all() call from inside to outside a critical section,
this patch does not change any functionality.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Since it is not safe to use queue_flag_(set|clear)_unlocked()
without holding the queue lock after the sysfs entries for a
queue have been created, complain if this happens.
Signed-off-by: Bart Van Assche
Cc: Mike Snitzer
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Mi
Use blk_queue_flag_set() instead of open-coding this function.
Signed-off-by: Bart Van Assche
Cc: Nicholas A. Bellinger
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
drivers/target/loopback/tcm_loop.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
This patch helps to avoid that new code gets introduced in block drivers
that manipulates queue flags without holding the queue lock when that
lock should be held.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
block/blk.h
Use the blk_queue_flag_{set,clear}() functions instead of open-coding
these.
Signed-off-by: Bart Van Assche
Cc: Michael Lyle
Cc: Kent Overstreet
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
drivers/md/bcache/super.c | 6 +++---
1 file changed, 3 inserti
This patch has been generated as follows:
for verb in set_unlocked clear_unlocked set clear; do
replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
$(git grep -lw queue_flag_${verb} drivers block/bsg*)
done
Except for protecting all queue flag changes with the queue lock
Use blk_queue_flag_set() instead of open-coding this function.
Signed-off-by: Bart Van Assche
Cc: Martin K. Petersen
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Ming Lei
---
drivers/scsi/iscsi_tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a
On Wed, Feb 28 2018, Bart Van Assche wrote:
> On Wed, 2018-02-28 at 11:19 -0700, Jens Axboe wrote:
> > Didn't Ming ack the first three?
>
> Hello Jens,
>
> This morning I did what I usually do before I repost a patch series, namely to
> look at the replies to individual patches for reviewed-by ta
On Wed, 2018-02-28 at 11:19 -0700, Jens Axboe wrote:
> Didn't Ming ack the first three?
Hello Jens,
This morning I did what I usually do before I repost a patch series, namely to
look at the replies to individual patches for reviewed-by tags. That's how I
overlooked the following (see also
https
On 2/28/18 11:15 AM, Bart Van Assche wrote:
> Hello Jens,
>
> Recently Joseph Qi identified races between the block cgroup code and request
> queue initialization and cleanup. This patch series address these races.
>
> This patch series is structured as follows:
> - Patches 3..6 fix the aforement
Hello Jens,
Recently Joseph Qi identified races between the block cgroup code and request
queue initialization and cleanup. This patch series address these races.
This patch series is structured as follows:
- Patches 3..6 fix the aforementioned races.
- Patches 1..3 ensure that all maintained blo
This patch does not change any functionality.
Signed-off-by: Bart Van Assche
Reviewed-by: Joseph Qi
Cc: Christoph Hellwig
Cc: Philipp Reisner
Cc: Ulf Hansson
Cc: Kees Cook
---
block/blk-core.c | 7 ---
block/blk-mq.c| 2 +-
drivers/block/null_blk.c | 3
Initialize the request queue lock earlier such that the following
race can no longer occur:
blk_init_queue_node() blkcg_print_blkgs()
blk_alloc_queue_node (1)
q->queue_lock = &q->__queue_lock (2)
blkcg_init_queue(q) (3)
spin_lock_irq(blkg->
Avoid that the following race can occur:
blk_cleanup_queue() blkcg_print_blkgs()
spin_lock_irq(lock) (1) spin_lock_irq(blkg->q->queue_lock) (2,5)
q->queue_lock = &q->__queue_lock (3)
spin_unlock_irq(lock) (4)
spin_unlock_irq(blkg-
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Joseph Qi
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Sergey Senozhatsky
---
drivers/block/zram/zram_drv.c |
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Joseph Qi
Cc: Josef Bacik
Cc: Shaohua Li
Cc: Omar Sandoval
Cc: Hannes Reinecke
Cc: Ming Lei
---
dr
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Joseph Qi
Cc: Shaohua Li
---
drivers/md/md.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(
On 28/02/18 19:38, Boaz Harrosh wrote:
<>
> Based on v4.15 code feel free to add to your patchset
> where needed
>
Sorry is what happens when you work on so many Linux versions at
the same time. This one is based on v4.15
>
> Don't leak the request if blk_rq_append_bio fails
>
> Sign-of-by
Hmm. I'd rather just kill off bio_check_eod and move the check
to blk_partition_remap so that we only have to check once.
What do you think of this version? Probably needs to be split into
one or two prep patches and the real change.
diff --git a/block/blk-core.c b/block/blk-core.c
index 3ba432
On 27/02/18 16:44, Jiri Palecek wrote:
<>
>> These are BIDI commands that travel as a couple of chained requests. They are
>> sent as BLOCK_PC command and complete or fail as one hole unit. The system
>> is not
>> allowed (And does not know how) to split them or complete them partially.
>> This i
From: Damien Le Moal
In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-
Assign missing mccap value on 2.0 path
Signed-off-by: Javier González
---
drivers/nvme/host/lightnvm.c | 4 +++-
include/linux/lightnvm.h | 8 +---
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e276ace28c6
Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance channel and LUN partitioning.
Instead, create a
# Changes since V4
- Rebase on top of Matias' for-4.17/core
- Fix pblk's write buffer size when using mw_cuints
- Remove chunk information from pblk's sysfs. We intend to clean up
sysfs, as it is messy as it is now, and use trace points instead. So,
avoid an extra refactoring in the near f
Separate the version between major and minor on the generic geometry and
represent it through sysfs in the 2.0 path. The 1.2 path only shows the
major version to preserve the existing user space interface.
Signed-off-by: Javier González
---
drivers/lightnvm/core.c | 4 ++--
drivers/nvme/ho
Normalize nomenclature for naming channels, luns, chunks, planes and
sectors as well as derivations in order to improve readability.
Signed-off-by: Javier González
---
drivers/lightnvm/core.c | 89 +--
drivers/lightnvm/pblk-core.c | 4 +-
drivers/l
On address conversions, use the generic device, instead of the target
device. This allows to use conversions outside of the target's realm.
Signed-off-by: Javier González
---
drivers/lightnvm/core.c | 4 ++--
include/linux/lightnvm.h | 8
2 files changed, 6 insertions(+), 6 deletions(-
Complete the generic geometry structure with the maxoc and maxocpu
felds, present in the 2.0 spec. Also, expose them through sysfs.
Signed-off-by: Javier González
---
drivers/nvme/host/lightnvm.c | 17 +
include/linux/lightnvm.h | 2 ++
2 files changed, 19 insertions(+)
dif
Add support for 2.0 address format. Also, align address bits for 1.2 and
2.0 to be able to operate on channel and luns without requiring a format
conversion. Use a generic address format for this purpose.
Signed-off-by: Javier González
---
drivers/lightnvm/core.c | 20 -
include/linux/
The 2.0 spec provides a report chunk log page that can be retrieved
using the stangard nvme get log page. This replaces the dedicated
get/put bad block table in 1.2.
This patch implements the helper functions to allow targets retrieve the
chunk metadata using get log page. It makes nvme_get_log_ex
Implement 2.0 support in pblk. This includes the address formatting and
mapping paths, as well as the sysfs entries for them.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-init.c | 57 ++--
drivers/lightnvm/pblk-sysfs.c | 36 ++--
drivers/lightnvm/pblk.h | 198 +++
In preparation of pblk supporting 2.0, implement the get log report
chunk in pblk. Also, define the chunk states as given in the 2.0 spec.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-core.c | 139 +++
drivers/lightnvm/pblk-init.c | 223 +++
Use the generic address format on common address manipulations.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-core.c | 10 +-
drivers/lightnvm/pblk-map.c | 4 ++--
drivers/lightnvm/pblk-sysfs.c | 4 ++--
drivers/lightnvm/pblk.h | 4 ++--
4 files changed, 11 inserti
In preparation for 2.0 support in pblk, rename variables referring to
the address format to addrf and reserve ppaf for the 1.2 path.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-init.c | 8
drivers/lightnvm/pblk-sysfs.c | 4 ++--
drivers/lightnvm/pblk.h | 16 +++
Refactor init and exit sequences to improve readability. In the way, fix
bad free ordering on the init error path.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-init.c | 533 +--
1 file changed, 266 insertions(+), 267 deletions(-)
diff --git a/
Looks fine,
Reviewed-by: Christoph Hellwig
Looks fine,
Reviewed-by: Christoph Hellwig
# Changes since V1:
- Rebase on top of latest 2.0 changes
Javier González (1):
lightnvm: pblk: remove unused variable
drivers/lightnvm/pblk-core.c | 3 ---
1 file changed, 3 deletions(-)
--
2.7.4
Remove unused variable after a previous cleanup (a8112b631adb)
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-core.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index cd663855ee88..b85cdedb5f48 100644
--- a/drivers/l
On Wed, Feb 28, 2018 at 08:28:25AM -0700, Jens Axboe wrote:
> On 2/28/18 1:51 AM, Omar Sandoval wrote:
> > On Tue, Feb 27, 2018 at 03:34:53PM -0700, Jens Axboe wrote:
> >> Similarly to the support we have for testing/faking timeouts for
> >> null_blk, this adds support for triggering a requeue cond
On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > Ming -
> >
> > Quick testing on my setup - Performance slightly degraded (4-5%
> > drop)for
> > megaraid_sas driver with this patch. (From 1610K IOPS it goes to
> > 1544K)
> > I
On Wed, Feb 28, 2018 at 09:15:37AM -0700, Jens Axboe wrote:
> On 2/28/18 9:14 AM, Omar Sandoval wrote:
> > On Wed, Feb 28, 2018 at 08:28:25AM -0700, Jens Axboe wrote:
> >> On 2/28/18 1:51 AM, Omar Sandoval wrote:
> >>> On Tue, Feb 27, 2018 at 03:34:53PM -0700, Jens Axboe wrote:
> Similarly to
On 2/28/18 9:14 AM, Omar Sandoval wrote:
> On Wed, Feb 28, 2018 at 08:28:25AM -0700, Jens Axboe wrote:
>> On 2/28/18 1:51 AM, Omar Sandoval wrote:
>>> On Tue, Feb 27, 2018 at 03:34:53PM -0700, Jens Axboe wrote:
Similarly to the support we have for testing/faking timeouts for
null_blk, thi
On Tue, Feb 27, 2018 at 04:56:41PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval
>
> Two fixlets inspired by Tejun's patch
> (https://patchwork.kernel.org/patch/10226749/). Patch 2 is what we
> discussed on that patch, patch 1 is a small preparation.
>
> Omar Sandoval (2):
> block: clear c
Create a shorten version to use in the generic geometry.
Signed-off-by: Javier González
---
drivers/nvme/host/lightnvm.c | 6 ++
include/linux/lightnvm.h | 8
2 files changed, 14 insertions(+)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index a600e7
At this point, only 1.2 spec is supported, thus check for it. Also,
since device-side L2P is only supported in the 1.2 spec, make sure to
only check its value under 1.2.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-init.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(
On 2/27/18 5:32 PM, Bart Van Assche wrote:
> Hello Jens,
>
> While analyzing the mq-deadline behavior for ZBC drives together with Damien
> we noticed the following:
> - That the request queue attribute methods are not contiguous in
> blk-mq-debugfs.c.
> - That the information about which zones
On 2/27/18 5:56 PM, Omar Sandoval wrote:
> From: Omar Sandoval
>
> Two fixlets inspired by Tejun's patch
> (https://patchwork.kernel.org/patch/10226749/). Patch 2 is what we
> discussed on that patch, patch 1 is a small preparation.
Unless I hear complaints, I'm going to queue these up for 4.17
On 2/27/18 4:37 PM, Bart Van Assche wrote:
> On Thu, 2018-02-22 at 17:08 -0800, Bart Van Assche wrote:
>> Recently Joseph Qi identified races between the block cgroup code and request
>> queue initialization and cleanup. This patch series address these races.
>> Please
>> consider these patches fo
On 2/28/18 1:51 AM, Omar Sandoval wrote:
> On Tue, Feb 27, 2018 at 03:34:53PM -0700, Jens Axboe wrote:
>> Similarly to the support we have for testing/faking timeouts for
>> null_blk, this adds support for triggering a requeue condition.
>> Considering the issues around restart we've been seeing, t
On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> Ming -
>
> Quick testing on my setup - Performance slightly degraded (4-5% drop)for
> megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
> I confirm that after applying this patch, we have #queue = #numa node.
>
Ming -
Quick testing on my setup - Performance slightly degraded (4-5% drop)for
megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
I confirm that after applying this patch, we have #queue = #numa node.
ls -l
/sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2
If a file system is mounted on the nbd during a disconnect, resetting
the size to 0, might change the block size and destroy the buffer_head
mappings. This will cause a infinite loop when the file system looks for
the buffer_heads for flushing.
Only set the file size to 0, if we are the only opene
On Tue, Feb 27, 2018 at 03:34:53PM -0700, Jens Axboe wrote:
> Similarly to the support we have for testing/faking timeouts for
> null_blk, this adds support for triggering a requeue condition.
> Considering the issues around restart we've been seeing, this should be
> a useful addition to the testi
79 matches
Mail list logo