In preparation for fixing the broken definition of S_DAX in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all the remaining
IS_DAX() usages to use explicit tests for FSDAX.
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc:
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Review
Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y
case. Otherwise vma_is_dax() may incorrectly return false in the
Device-DAX case.
Cc: Alexander Viro
Cc: linux-fsde...@vger.kernel.org
Cc: Christoph Hellwig
Cc:
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mma
In preparation for fixing the broken definition of S_DAX in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to
use explicit tests for the DEVDAX and FSDAX sub-cases of DAX
functionality.
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc:
Fixes: dee410792419 ("/dev/dax, core: file
Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding
In preparation for fixing the broken definition of S_DAX in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to
use explicit tests for FSDAX since DAX is ambiguous.
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Alexander Viro
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc:
Fixes
In preparation for fixing the broken definition of S_DAX in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to
use explicit tests for FSDAX since DAX is ambiguous.
Cc: "Darrick J. Wong"
Cc: linux-...@vger.kernel.org
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc:
Fixes: dee410
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n +
CONFIG_DEV_DAX=y case, move the definition of these routines outside of
the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to
move all ifdef handling to header files rather than in the source. The
compiler will st
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n +
CONFIG_DEV_DAX=y case, move the definition of these routines outside of
the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to
move all ifdef handling to header files rather than in the source. The
compiler will st
In preparation for fixing the broken definition of S_DAX in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to
use explicit tests for FSDAX since DAX is ambiguous.
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc:
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap
dax_sem_{up,down}_write_sem() allow the ext2 dax semaphore to be
compiled out in the CONFIG_FS_DAX=n case. However there are still some
open coded uses of the semaphore. Add dax_sem_{up_read,down_read}() and
dax_sem_assert_held() helpers. Use them to convert all open-coded usages
of the semaphore t
Changes since v4 [1]:
* Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to
better clarify the need for new helpers (Jan)
* Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan)
* Use file_inode() in vma_is_dax() (Jan)
* Resend the full series to linux-xfs@ (Dave)
* Collect
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm")
Cc:
Reported-by: Gerd Rausch
The current IS_DAX() helper that checks if a file is in DAX mode serves
two purposes. It is a control flow branch condition for DAX vs
non-DAX paths and it is a mechanism to perform dead code elimination. The
dead code elimination is required in the CONFIG_FS_DAX=n case since
there are symbols in f
The current powerpc definition of vma_mmu_pagesize() open codes looking
up the page size via hstate. It is identical to the generic
vma_kernel_pagesize() implementation.
Now, vma_kernel_pagesize() is growing support for determining the
page size of Device-DAX vmas in addition to the existing Huget
Changes since v2:
* Split the fix of the definition vma_mmu_pagesize() on powerpc to its
own patch.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014101.html
---
Andrew,
Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce ->split() to
vm_operations_struct" here is anothe
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and report the MMU page mapping size that is being enforced by
the vma. Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce
->split() to vm_operations_struct" it would be messy to teach
vma_mmu_pagesize() about
Given that device-dax is making similar page mapping size guarantees as
hugetlbfs, emit the size in smaps and any other kernel path that
requests the mapping size of a vma.
Reported-by: Jane Chu
Signed-off-by: Dan Williams
---
drivers/dax/device.c | 10 ++
1 file changed, 10 insertion
This Message was undeliverable due to the following reason:
Your message was not delivered because the destination computer was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.
Most likely ther
On 01/03/18 05:36 PM, Dan Williams wrote:
On Thu, Mar 1, 2018 at 4:15 PM, Logan Gunthorpe wrote:
On 01/03/18 10:44 AM, Bjorn Helgaas wrote:
I think these two statements are out of order, since the attributes
dereference pdev->p2pdma. And it looks like you set "error"
unnecessarily, since
On Thu, Mar 1, 2018 at 4:15 PM, Logan Gunthorpe wrote:
>
>
> On 01/03/18 10:44 AM, Bjorn Helgaas wrote:
>>
>> I think these two statements are out of order, since the attributes
>> dereference pdev->p2pdma. And it looks like you set "error"
>> unnecessarily, since you return immediately looking a
On 01/03/18 10:44 AM, Bjorn Helgaas wrote:
I think these two statements are out of order, since the attributes
dereference pdev->p2pdma. And it looks like you set "error"
unnecessarily, since you return immediately looking at it.
Per the previous series, sysfs_create_group is must_check for
On 01/03/18 04:57 PM, Stephen Bates wrote:
We don't want to lump these all together without knowing which region you're
allocating from, right?
In all seriousness I do agree with you on these Keith in the long term. We
would consider adding property flags for the memory as it is added to t
On 01/03/18 04:15 PM, Bjorn Helgaas wrote:
The question is what the relevant switch is. We call pci_enable_acs()
on every PCI device, including Root Ports. It looks like this relies
on get_upstream_bridge_port() to filter out some things. I don't
think get_upstream_bridge_port() is doing the
> We don't want to lump these all together without knowing which region you're
> allocating from, right?
In all seriousness I do agree with you on these Keith in the long term. We
would consider adding property flags for the memory as it is added to the p2p
core and then the allocator could evo
On 01/03/18 04:26 PM, Benjamin Herrenschmidt wrote:
The big problem is not the vmemmap, it's the linear mapping.
Ah, yes, ok.
Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
> There's a meaningful difference between writing to an NVMe CMB vs PMR
When the PMR spec becomes public we can discuss how best to integrate it into
the P2P framework (if at all) ;-).
Stephen
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.
On 01/03/18 04:49 PM, Keith Busch wrote:
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote:
P2P is about offloading the memory and PCI subsystem of the host CPU
and this is achieved no matter which p2p_dev is used.
Even within a device, memory attributes for its various regions
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote:
>
> P2P is about offloading the memory and PCI subsystem of the host CPU
> and this is achieved no matter which p2p_dev is used.
Even within a device, memory attributes for its various regions may not be
the same. There's a meaningfu
On Thu, Mar 01, 2018 at 11:14:46PM +, Stephen Bates wrote:
> > I'm pretty sure the spec disallows routing-to-self so doing a P2P
> > transaction in that sense isn't going to work unless the device
> > specifically supports it and intercepts the traffic before it gets to
> > the port.
>
> T
> No, locality matters. If you have a bunch of NICs and bunch of drives
> and the allocator chooses to put all P2P memory on a single drive your
> performance will suck horribly even if all the traffic is offloaded.
Sagi brought this up earlier in his comments about the _find_ function.
On 01/03/18 04:20 PM, Jason Gunthorpe wrote:
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote:
No, locality matters. If you have a bunch of NICs and bunch of drives
and the allocator chooses to put all P2P memory on a single drive your
performance will suck horribly even if all
On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote:
(Switching back to my non-IBM address ...)
> On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote:
> > We use only 52 in practice but yes.
> >
> > > That's 64PB. If you use need
> > > a sparse vmemmap for the entire space it will take 16T
On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote:
>
> On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote:
> > We use only 52 in practice but yes.
> >
> > > That's 64PB. If you use need
> > > a sparse vmemmap for the entire space it will take 16TB which leaves you
> > > with 63.98PB of a
On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote:
We use only 52 in practice but yes.
That's 64PB. If you use need
a sparse vmemmap for the entire space it will take 16TB which leaves you
with 63.98PB of address space left. (Similar calculations for other
numbers of address bits.)
We on
On Thu, Mar 01, 2018 at 06:54:01PM +, Stephen Bates wrote:
> Thanks for the detailed review Bjorn!
>
> >> +Enabling this option will also disable ACS on all ports behind
> >> +any PCIe switch. This effictively puts all devices behind any
> >> +switch into the same IOMMU group.
>
> I'm pretty sure the spec disallows routing-to-self so doing a P2P
> transaction in that sense isn't going to work unless the device
> specifically supports it and intercepts the traffic before it gets to
> the port.
This is correct. Unless the device intercepts the TLP before it hits the
roo
I don't think this is correct. A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1). The capability to route
peer-to-peer transactions *between* hierarchy domains is optional. I
think this means a Root Complex is not required to route transactions
from one Root Port t
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote:
> Hi Bjorn,
>
> Thanks for the review. I'll correct all the nits for the next version.
>
> On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> > > Some PCI devices may ha
>> We'd prefer to have a generic way to get p2pmem instead of restricting
>> ourselves to only using CMBs. We did work in the past where the P2P memory
>> was part of an IB adapter and not the NVMe card. So this won't work if it's
>> an NVMe only interface.
> It just seems like it it makin
On Thu, 2018-03-01 at 14:57 -0700, Logan Gunthorpe wrote:
>
> On 01/03/18 02:45 PM, Logan Gunthorpe wrote:
> > It handles it fine for many situations. But when you try to map
> > something that is at the end of the physical address space then the
> > spares-vmemmap needs virtual address space th
On 01/03/18 03:45 PM, Jason Gunthorpe wrote:
I can appreciate you might have some special use case for that, but it
absolutely should require special configuration and not just magically
happen.
Well if driver doesn't want someone doing p2p transfers with the memory
it shouldn't publish it t
On Thu, 2018-03-01 at 14:31 -0800, Linus Torvalds wrote:
> On Thu, Mar 1, 2018 at 2:06 PM, Benjamin Herrenschmidt
> wrote:
> >
> > Could be that x86 has the smarts to do the right thing, still trying to
> > untangle the code :-)
>
> Afaik, x86 will not cache PCI unless the system is misconfigur
On Thu, Mar 1, 2018 at 2:06 PM, Benjamin Herrenschmidt wrote:
>
> Could be that x86 has the smarts to do the right thing, still trying to
> untangle the code :-)
Afaik, x86 will not cache PCI unless the system is misconfigured, and
even then it's more likely to just raise a machine check exceptio
On Thu, 2018-03-01 at 13:53 -0700, Jason Gunthorpe wrote:
> On Fri, Mar 02, 2018 at 07:40:15AM +1100, Benjamin Herrenschmidt wrote:
> > Also we need to be able to hard block MEMREMAP_WB mappings of non-RAM
> > on ppc64 (maybe via an arch hook as it might depend on the processor
> > family). Server
On 01/03/18 02:45 PM, Logan Gunthorpe wrote:
It handles it fine for many situations. But when you try to map
something that is at the end of the physical address space then the
spares-vmemmap needs virtual address space that's the size of the
physical address space divided by PAGE_SIZE which
On 01/03/18 02:37 PM, Dan Williams wrote:
Ah ok, I'd need to look at the details. I had been assuming that
sparse-vmemmap could handle such a situation, but that could indeed be
a broken assumption.
It handles it fine for many situations. But when you try to map
something that is at the end
On 01/03/18 02:35 PM, Jerome Glisse wrote:
Note that they are usecase for P2P where IOMMU isolation matter and
the traffic through root complex isn't see as an issue.
Well, we can worry about that once we have a solution to the problem of
knowing whether a root complex supports P2P at all.
> The intention of HMM is to be useful for all device memory that wish
> to have struct page for various reasons.
Hi Jermone and thanks for your input! Understood. We have looked at HMM in the
past and long term I definitely would like to consider how we can add P2P
functionality to HMM for both
On Thu, Mar 1, 2018 at 12:34 PM, Benjamin Herrenschmidt
wrote:
> On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote:
>> On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt
>> wrote:
>> > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
>> > > On Wed, 2018-02-28 at 16:39 -07
On Thu, Mar 01, 2018 at 09:32:20PM +, Stephen Bates wrote:
> > your kernel provider needs to decide whether they favor device assignment
> > or p2p
>
> Thanks Alex! The hardware requirements for P2P (switch, high performance EPs)
> are such that we really only expect CONFIG_P2P_DMA to be en
> your kernel provider needs to decide whether they favor device assignment or
> p2p
Thanks Alex! The hardware requirements for P2P (switch, high performance EPs)
are such that we really only expect CONFIG_P2P_DMA to be enabled in specific
instances and in those instances the users have made a
On 01/03/18 02:21 PM, Alex Williamson wrote:
This is still a pretty terrible solution though, your kernel provider
needs to decide whether they favor device assignment or p2p, because we
can't do both, unless there's a patch I haven't seen yet that allows
boot time rather than compile time conf
On Thu, Mar 01, 2018 at 02:15:01PM -0700, Logan Gunthorpe wrote:
>
>
> On 01/03/18 02:10 PM, Jerome Glisse wrote:
> > It seems people miss-understand HMM :( you do not have to use all of
> > its features. If all you care about is having struct page then just
> > use that for instance in your case
On 01/03/18 02:18 PM, Jerome Glisse wrote:
This is pretty easy to do with HMM:
unsigned long hmm_page_to_phys_pfn(struct page *page)
This is not useful unless you want to go through all the kernel paths we
are using and replace page_to_phys() and friends with something else
that calls an HMM
On Thu, 1 Mar 2018 18:54:01 +
"Stephen Bates" wrote:
> Thanks for the detailed review Bjorn!
>
> >>
> >> +Enabling this option will also disable ACS on all ports behind
> >> +any PCIe switch. This effictively puts all devices behind any
> >> +switch into the same IOMMU group.
On Thu, Mar 01, 2018 at 02:11:34PM -0700, Logan Gunthorpe wrote:
>
>
> On 01/03/18 02:03 PM, Benjamin Herrenschmidt wrote:
> > However, what happens if anything calls page_address() on them ? Some
> > DMA ops do that for example, or some devices might ...
>
> Although we could probably work arou
On 01/03/18 02:10 PM, Jerome Glisse wrote:
It seems people miss-understand HMM :( you do not have to use all of
its features. If all you care about is having struct page then just
use that for instance in your case only use those following 3 functions:
hmm_devmem_add() or hmm_devmem_add_resour
On 01/03/18 02:03 PM, Benjamin Herrenschmidt wrote:
However, what happens if anything calls page_address() on them ? Some
DMA ops do that for example, or some devices might ...
Although we could probably work around it with some pain, we rely on
page_address() and virt_to_phys(), etc to work
On Thu, Mar 01, 2018 at 02:03:26PM -0700, Logan Gunthorpe wrote:
>
>
> On 01/03/18 01:55 PM, Jerome Glisse wrote:
> > Well this again a new user of struct page for device memory just for
> > one usecase. I wanted HMM to be more versatile so that it could be use
> > for this kind of thing too. I g
On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote:
>
>
> The devm_memremap_pages() infrastructure allows placing the memmap in
> "System-RAM" even if the hotplugged range is in PCI space. So, even if
> it is an issue on some configurations, it's just a simple adjustment
> to where the memmap
On 01/03/18 01:55 PM, Jerome Glisse wrote:
Well this again a new user of struct page for device memory just for
one usecase. I wanted HMM to be more versatile so that it could be use
for this kind of thing too. I guess the message didn't go through. I
will take some cycles tomorrow to look into
On 01/03/18 01:53 PM, Jason Gunthorpe wrote:
On Fri, Mar 02, 2018 at 07:40:15AM +1100, Benjamin Herrenschmidt wrote:
Also we need to be able to hard block MEMREMAP_WB mappings of non-RAM
on ppc64 (maybe via an arch hook as it might depend on the processor
family). Server powerpc cannot do cach
On Fri, Mar 02, 2018 at 07:29:55AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2018-03-01 at 11:04 -0700, Logan Gunthorpe wrote:
> >
> > On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote:
> > > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> > > > The problem is that acccord
On 01/03/18 01:29 PM, Benjamin Herrenschmidt wrote:
Oliver can you look into this ? You sais the memory was effectively
hotplug'ed into the system when creating the struct pages. That would
mean to me that it's a) mapped (which for us is cachable, maybe x86 has
tricks to avoid that) and b) pote
On Fri, 2018-03-02 at 07:34 +1100, Benjamin Herrenschmidt wrote:
>
> But what happens with that PCI memory ? Is it effectively turned into
> nromal memory (ie, usable for normal allocations, potentially used to
> populate user pages etc...) or is it kept aside ?
(What I mean is is it added to the
On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote:
> On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt
> wrote:
> > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
> > > > Hi Everyone,
> > >
> > >
> > > So
On Thu, 2018-03-01 at 18:09 +, Stephen Bates wrote:
> > > So Oliver (CC) was having issues getting any of that to work for us.
> > >
> > > The problem is that acccording to him (I didn't double check the latest
> > > patches) you effectively hotplug the PCIe memory into the system when
> > >
On Thu, 2018-03-01 at 11:04 -0700, Logan Gunthorpe wrote:
>
> On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote:
> > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> > > The problem is that acccording to him (I didn't double check the latest
> > > patches) you effectively hotplu
On 01/03/18 10:49 AM, Bjorn Helgaas wrote:
+int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
Same question as before about why the mixture of "pci_*" interfaces
that take "struct device *" parameters.
In this cas
On 01/03/18 03:31 AM, Sagi Grimberg wrote:
* We also reject using devices that employ 'dma_virt_ops' which should
fairly simply handle Jason's concerns that this work might break with
the HFI, QIB and rxe drivers that use the virtual ops to implement
their own special DMA operations.
On 01/03/18 12:21 PM, Dan Williams wrote:
Note: I think the above means it won't work behind a switch on x86
either, will it ?
The devm_memremap_pages() infrastructure allows placing the memmap in
"System-RAM" even if the hotplugged range is in PCI space. So, even if
it is an issue on some co
On 01/03/18 11:42 AM, Jason Gunthorpe wrote:
On Thu, Mar 01, 2018 at 08:35:55PM +0200, Sagi Grimberg wrote:
This is also why I don't entirely understand why this series has a
generic allocator for p2p mem, it makes little sense to me.
Why wouldn't the nmve driver just claim the entire CMB of
On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt
wrote:
> On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
>> On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
>> > Hi Everyone,
>>
>>
>> So Oliver (CC) was having issues getting any of that to work for us.
>>
>> The p
On 01/03/18 11:02 AM, Bjorn Helgaas wrote:
void pci_enable_acs(struct pci_dev *dev)
{
+ if (pci_p2pdma_disable_acs(dev))
+ return;
This doesn't read naturally to me. I do see that when
CONFIG_PCI_P2PDMA is not set, pci_p2pdma_disable_acs() does nothing
and returns 0,
Wouldn't it all be simpler if the p2p_dev resolution would be private
to the namespace?
So is adding some all the namespaces in a subsystem must comply to
using p2p? Seems a little bit harsh if its not absolutely needed. Would
be nice to export a subsystems between two ports (on two HCAs, acros
> I agree, I don't think this series should target anything other than
> using p2p memory located in one of the devices expected to participate
> in the p2p trasnaction for a first pass..
I disagree. There is definitely interest in using a NVMe CMB as a bounce buffer
and in deploying systems
Hi Bjorn,
Thanks for the review. I'll correct all the nits for the next version.
On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactio
Thanks for the detailed review Bjorn!
>>
>> + Enabling this option will also disable ACS on all ports behind
>> + any PCIe switch. This effictively puts all devices behind any
>> + switch into the same IOMMU group.
>
> Does this really mean "all devices behind the same Root Port
On 01/03/18 04:03 AM, Sagi Grimberg wrote:
Can you describe what would be the plan to have it when these devices
do come along? I'd say that p2p_dev needs to become a nvmet_ns reference
and not from nvmet_ctrl. Then, when cmb capable devices come along, the
ns can prefer to use its own cmb inst
>> So Oliver (CC) was having issues getting any of that to work for us.
>>
>> The problem is that acccording to him (I didn't double check the latest
>> patches) you effectively hotplug the PCIe memory into the system when
>> creating struct pages.
>>
>> This cannot possibly work for us. First we
On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote:
On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
The problem is that acccording to him (I didn't double check the latest
patches) you effectively hotplug the PCIe memory into the system when
creating struct pages.
This cannot
On Wed, Feb 28, 2018 at 04:40:00PM -0700, Logan Gunthorpe wrote:
> For peer-to-peer transactions to work the downstream ports in each
> switch must not have the ACS flags set. At this time there is no way
> to dynamically change the flags and update the corresponding IOMMU
> groups so this is done
On Wed, Feb 28, 2018 at 04:39:59PM -0700, Logan Gunthorpe wrote:
> The DMA address used when mapping PCI P2P memory must be the PCI bus
> address. Thus, introduce pci_p2pmem_[un]map_sg() to map the correct
> addresses when using P2P memory.
>
> For this, we assume that an SGL passed to these funct
On Wed, Feb 28, 2018 at 04:39:58PM -0700, Logan Gunthorpe wrote:
> Attributes display the total amount of P2P memory, the amount available
> and whether it is published or not.
Can you add enough text here to make the body of the changelog
complete in itself? That might mean just repeating the su
On 01/03/18 04:03 AM, Sagi Grimberg wrote:
Can you describe what would be the plan to have it when these devices
do come along? I'd say that p2p_dev needs to become a nvmet_ns reference
and not from nvmet_ctrl. Then, when cmb capable devices come along, the
ns can prefer to use its own cmb inst
s/peer to peer/peer-to-peer/ to match text below and in spec.
On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be
Hey Sagi,
Thanks for the review!
On 01/03/18 03:32 AM, Sagi Grimberg wrote:
int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8
port_num,
struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
- u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+ u64
> > Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
> > save an extra PCI transfer as the NVME card could just take the data
> > out of it's own memory. However, at this time, cards with CMB buffers
> > don't seem to be available.
> Can you describe what would be the plan to
> Any plans adding the capability to nvme-rdma? Should be
> straight-forward... In theory, the use-case would be rdma backend
> fabric behind. Shouldn't be hard to test either...
Nice idea Sagi. Yes we have been starting to look at that. Though again we
would probably want to impose the "attached
Looks fine,
Reviewed-by: Sagi Grimberg
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
For P2P requests we must use the pci_p2pmem_[un]map_sg() functions
instead of the dma_map_sg functions.
With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.
This lo
Looks fine,
Reviewed-by: Sagi Grimberg
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch as the RDMA port and all the block devices in use. If
the user enabled it an no d
On 03/01/2018 01:40 AM, Logan Gunthorpe wrote:
In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be
called to map the correct DMA address. To do this, we add a flags
variable and the RDMA_RW_CTX_FLAG_PCI_P2P flag. When the flag is
specified use the appropriate map function.
Hi Everyone,
Hi Logan,
Here's v2 of our series to introduce P2P based copy offload to NVMe
fabrics. This version has been rebased onto v4.16-rc3 which already
includes Christoph's devpagemap work the previous version was based
off as well as a couple of the cleanup patches that were in v1.
95 matches
Mail list logo