Re: [RFC PATCH v2 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2016-01-04 Thread Benjamin Herrenschmidt
On Mon, 2016-01-04 at 14:07 -0700, Alex Williamson wrote: > On Thu, 2015-12-31 at 16:50 +0800, Yongji Xie wrote: > > Current vfio-pci implementation disallows to mmap MSI-X > > table in case that user get to touch this directly. > > > > However, EEH mechanism can ensure that a given pci device >

Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Benjamin Herrenschmidt
On Thu, 2015-12-17 at 14:41 -0700, Alex Williamson wrote: > > So I think it is safe to mmap/passthrough MSI-X table on PPC64 > > platform. > > And I'm not sure whether other architectures can ensure these two  > > points.  > > There is another consideration, which is the API exposed to the user.

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-19 Thread Benjamin Herrenschmidt
On Thu, 2015-11-19 at 23:38 +, David Woodhouse wrote: > > I understand that POWER and other platforms don't currently have a > clean way to indicate that certain device don't have translation. And I > understand that we may end up with a *quirk* which ensures that the DMA > API does the right

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 11:27 +0100, Joerg Roedel wrote: > > You have the same problem when real PCIe devices appear that speak > virtio. I think the only real (still not very nice) solution is to add a > quirk to powerpc platform code that sets noop dma-ops for the existing > virtio

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 14:43 +0200, Michael S. Tsirkin wrote: > But not virtio-pci I think - that's broken for that usecase since we use > weaker barriers than required for real IO, as these have measureable > overhead.  We could have a feature "is a real PCI device", > that's completely

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 10:54 -0800, Andy Lutomirski wrote: >  > Does that work on powerpc on existing kernels? > > Anyway, here's another crazy idea: make the quirk assume that the > IOMMU is bypasses if and only if the weak barriers bit is set on > systems that are missing the new DT binding.

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 21:35 -0800, Andy Lutomirski wrote: > > We could do it the other way around: on powerpc, if a PCI device is in > that range and doesn't have the "bypass" property at all, then it's > assumed to bypass the IOMMU.  This means that everything that > currently works continues

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 15:44 -0800, Andy Lutomirski wrote: > > > What about partition <-> partition virtio such as what we could do on > > PAPR systems. That would have the weak barrier bit. > > > > Is it partition <-> partition, bypassing IOMMU? No. > I think I'd settle for just something that

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 10:45 +0100, Knut Omang wrote: > Can something be done by means of PCIe capabilities? > ATS (Address Translation Support) seems like a natural choice? Euh no... ATS is something else completely Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 20:46 -0800, Andy Lutomirski wrote: > Me neither.  At least it wouldn't be a regression, but it's still > crappy. > > I think that arm is fine, at least.  I was unable to find an arm QEMU > config that has any problems with my patches. Ok, give me a few days for my headache

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote: > > /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. > */ > static const struct pci_device_id virtio_pci_id_table[] = { >     { PCI_DEVICE(0x1af4, PCI_ANY_ID) }, >     { 0 } > }; > > Can we match on that range?

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote: > > Which leaves the special case of Xen, where even preexisting devices > don't bypass the IOMMU.  Can we keep this specific to powerpc and > sparc?  On x86, this problem is basically nonexistent, since the IOMMU > is properly

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote: > The problem here is that in some of the problematic cases the virtio > driver may not even be loaded.  If someone runs an L1 guest with an > IOMMU-bypassing virtio device and assigns it to L2 using vfio, then > *boom* L1 crashes.  (Same

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
So ... I've finally tried to sort that out for powerpc and I can't find a way to make that work that isn't a complete pile of stinking shit. I'm very tempted to go back to my original idea: virtio itself should indicate it's "bypassing ability" via the virtio config space or some other bit (like

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-03 Thread Benjamin Herrenschmidt
On Tue, 2015-11-03 at 14:11 +0100, Christoph Hellwig wrote: > > xHCI for example, vs. something like 10G ethernet... but yes I agree it > > sucks. I don't like that sort of policy anywhere in drivers. On the > > other hand the platform doesn't have much information to make that sort > > of

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Benjamin Herrenschmidt
On Mon, 2015-11-02 at 14:07 +0200, Shamir Rabinovitch wrote: > On Mon, Nov 02, 2015 at 09:00:34PM +1100, Benjamin Herrenschmidt > wrote: > > > > Chosing on a per-mapping basis *in the back end* might still make > > some > > In my case, choosing mapping based

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Benjamin Herrenschmidt
On Mon, 2015-11-02 at 22:48 +, David Woodhouse wrote: > > In the Intel case, the mapping setup is entirely per-device (except for > crap devices and devices behind a PCIe-PCI bridge, etc.). > > So you can happily have a passthrough mapping for *one* device, without > making that same mapping

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Benjamin Herrenschmidt
On Mon, 2015-11-02 at 09:23 +0200, Shamir Rabinovitch wrote: > To summary - > > 1. The whole point of the IOMMU pass through was to get bigger address space > and faster map/unmap operations for performance critical hardware > 2. SPARC IOMMU in particular has the ability to DVMA which

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Benjamin Herrenschmidt
On Mon, 2015-11-02 at 22:45 +0100, Arnd Bergmann wrote: > > Then I would argue for naming this differently. Make it an optional > > hint "DMA_ATTR_HIGH_PERF" or something like that. Whether this is > > achieved via using a bypass or other means in the backend not the > > business of the driver. >

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-01 Thread Benjamin Herrenschmidt
On Sun, 2015-11-01 at 09:45 +0200, Shamir Rabinovitch wrote: > Not sure this use case is possible for Infiniband where application hold > the data buffers and there is no way to force application to re use the > buffer as suggested. > > This is why I think there will be no easy way to bypass the

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-10-30 Thread Benjamin Herrenschmidt
On Fri, 2015-10-30 at 11:32 +0100, Arnd Bergmann wrote: > On Thursday 29 October 2015 10:10:46 Benjamin Herrenschmidt wrote: > > > > > Maybe we should at least coordinate IOMMU 'paranoid/fast' modes > > > across > > > architectures, and then the D

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-10-29 Thread Benjamin Herrenschmidt
On Thu, 2015-10-29 at 11:31 -0700, Andy Lutomirski wrote: > On Oct 28, 2015 6:11 PM, "Benjamin Herrenschmidt" > <b...@kernel.crashing.org> wrote: > > > > On Thu, 2015-10-29 at 09:42 +0900, David Woodhouse wrote: > > > On Thu, 2015-10-29 a

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-10-28 Thread Benjamin Herrenschmidt
On Wed, 2015-10-28 at 16:40 +0900, Christian Borntraeger wrote: > We have discussed that at kernel summit. I will try to implement a dummy > dma_ops for > s390 that does 1:1 mapping and Ben will look into doing some quirk to handle > "old" > code in addition to also make it possible to mark

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-10-28 Thread Benjamin Herrenschmidt
On Thu, 2015-10-29 at 09:42 +0900, David Woodhouse wrote: > On Thu, 2015-10-29 at 09:32 +0900, Benjamin Herrenschmidt wrote: > > > On Power, I generally have 2 IOMMU windows for a device, one at the > > bottom is remapped, and is generally used for 32-bit devices and the >

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-10-28 Thread Benjamin Herrenschmidt
On Wed, 2015-10-28 at 22:31 +0900, David Woodhouse wrote: > We have an option in the Intel IOMMU for pass-through mode too, which > basically *is* a total bypass. In practice, what's the difference > between that and a "simple translation that does not require any > [translation]"? We set up a

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Benjamin Herrenschmidt
On Fri, 2015-09-04 at 12:28 +0200, Thomas Huth wrote: > > Maybe some rcu protected scheme that doubles the amount of memslots > > for > > each overrun? Yes, that would be good and even reduce the footprint > > for > > systems with only a small number of memslots. > > Seems like Alex Williamson

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Benjamin Herrenschmidt
On Fri, 2015-09-04 at 12:28 +0200, Thomas Huth wrote: > > Maybe some rcu protected scheme that doubles the amount of memslots > > for > > each overrun? Yes, that would be good and even reduce the footprint > > for > > systems with only a small number of memslots. > > Seems like Alex Williamson

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Wed, 2015-09-02 at 08:24 +1000, Paul Mackerras wrote: > On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote: > > The size of the Problem State Priority Boost Register is only > > 32 bits, so let's change the type of the corresponding variable > > accordingly to avoid future trouble. >

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Wed, 2015-09-02 at 08:24 +1000, Paul Mackerras wrote: > On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote: > > The size of the Problem State Priority Boost Register is only > > 32 bits, so let's change the type of the corresponding variable > > accordingly to avoid future trouble. >

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote: > On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt > wrote: > > On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > > > The size of the Problem State Priority Boost Register is only > > > 32

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote: > On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt > wrote: > > On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > > > The size of the Problem State Priority Boost Register is only > > > 32

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. It's not future trouble, it's broken today for LE and this should fix it

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Benjamin Herrenschmidt
On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. It's not future trouble, it's broken today for LE and this should fix it

Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-14 Thread Benjamin Herrenschmidt
On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote: Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use mdelay() instead of msleep() in rtas_busy_delay()? Something more fancy? A proper fix would be more fancy, the get_sensor should happen in a kernel thread instead. Cheers,

Re: [PATCH 06/25] powerpc: Use bool function return values of true/false not 1/0

2015-03-30 Thread Benjamin Herrenschmidt
On Mon, 2015-03-30 at 16:46 -0700, Joe Perches wrote: Use the normal return values for bool functions Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org Should we merge it or will you ? Cheers, Ben. Signed-off-by: Joe Perches j...@perches.com --- arch/powerpc/include/asm/dcr

Re: [PATCH 06/25] powerpc: Use bool function return values of true/false not 1/0

2015-03-30 Thread Benjamin Herrenschmidt
On Mon, 2015-03-30 at 16:46 -0700, Joe Perches wrote: Use the normal return values for bool functions Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org Should we merge it or will you ? Cheers, Ben. Signed-off-by: Joe Perches j...@perches.com --- arch/powerpc/include/asm/dcr

Re: [PATCH 2/2] KVM: PPC: Remove page table walk helpers

2015-03-29 Thread Benjamin Herrenschmidt
On Mon, 2015-03-30 at 10:39 +0530, Aneesh Kumar K.V wrote: This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V

Re: [PATCH 2/2] KVM: PPC: Remove page table walk helpers

2015-03-29 Thread Benjamin Herrenschmidt
On Mon, 2015-03-30 at 10:39 +0530, Aneesh Kumar K.V wrote: This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V

Re: [PATCH v5 25/29] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-03-11 Thread Benjamin Herrenschmidt
On Wed, 2015-03-11 at 19:54 +1100, Alexey Kardashevskiy wrote: +/* Page size flags for ibm,query-pe-dma-window */ +#define DDW_PGSIZE_4K 0x01 +#define DDW_PGSIZE_64K 0x02 +#define DDW_PGSIZE_16M 0x04 +#define DDW_PGSIZE_32M 0x08 +#define

Re: [PATCH v5 03/29] vfio: powerpc/spapr: Check that TCE page size is equal to it_page_size

2015-03-10 Thread Benjamin Herrenschmidt
On Tue, 2015-03-10 at 17:03 -0600, Alex Williamson wrote: return (PAGE_SHIFT + compound_order(compound_head(page) = page_shift); This won't be bool though. Yes, it will. Don't you have your parenthesis in the wrong place, Alex ? :-) This will (I'll do this) shift = PAGE_SHIFT

Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-22 Thread Benjamin Herrenschmidt
On Wed, 2014-10-22 at 16:17 +0200, Jan Kiszka wrote: I thought about this again, and I'm not sure anymore if we can use ACPI to black-list the incompatible virtio devices. Reason: hotplug. To my understanding, the ACPI DRHD tables won't change during runtime when a device shows up or

Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Benjamin Herrenschmidt
On Mon, 2014-08-11 at 09:26 +0200, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index da86d9b..d95014e 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c This should be book3s_emulate.c. Any reason we can't make that

Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Benjamin Herrenschmidt
On Mon, 2014-08-11 at 09:26 +0200, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index da86d9b..d95014e 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c This should be book3s_emulate.c. Any reason we can't make that

Re: [PATCH v4 1/5] powerpc/eeh: Export eeh_iommu_group_to_pe()

2014-08-07 Thread Benjamin Herrenschmidt
On Thu, 2014-08-07 at 12:47 +1000, Gavin Shan wrote: The function is used by VFIO driver, which might be built as a dynamic module. So it should be exported. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org Alex, are you taking

Re: [PATCH v4 2/5] powerpc/eeh: Add warning message in eeh_dev_open()

2014-08-07 Thread Benjamin Herrenschmidt
On Thu, 2014-08-07 at 12:47 +1000, Gavin Shan wrote: The patch adds one warning message in eeh_dev_open() in case the PCI device can't be marked as passed through. Suggested-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- Acked-by: Benjamin

Re: [PATCH v2 4/4] vfio_pci: spapr: Enable VFIO if EEH is not supported

2014-08-05 Thread Benjamin Herrenschmidt
On Tue, 2014-08-05 at 21:44 -0600, Alex Williamson wrote: ret = vfio_spapr_pci_eeh_open(vdev-pdev); - if (ret) { - vfio_pci_disable(vdev); - goto error; - } + if (ret) +

Re: [PULL 16/63] PPC: Add asm helpers for BE 32bit load/store

2014-08-01 Thread Benjamin Herrenschmidt
-by: Alexander Graf ag...@suse.de Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/asm-compat.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h index 4b237aa..21be8ae 100644

Re: [PULL 16/63] PPC: Add asm helpers for BE 32bit load/store

2014-08-01 Thread Benjamin Herrenschmidt
-by: Alexander Graf ag...@suse.de Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/asm-compat.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h index 4b237aa..21be8ae 100644

Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT

2014-07-29 Thread Benjamin Herrenschmidt
is that it uses kvm_unmap_rmapp() which will also lock the HPTE (try_lock_hpte) and so shouldn't have a race vs the above code. Or do you see a race I don't ? Cheers, Ben. Thx. Fan On Mon, Jul 28, 2014 at 2:42 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2014-07-28 at 14:09

Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT

2014-07-28 Thread Benjamin Herrenschmidt
On Mon, 2014-07-28 at 14:09 +0800, Liu Ping Fan wrote: In current code, the setup of hpte is under the risk of race with mmu_notifier_invalidate, i.e we may setup a hpte with a invalid pfn. Resolve this issue by sync the two actions by KVMPPC_RMAP_LOCK_BIT. Please describe the race you think

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Benjamin Herrenschmidt
On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Benjamin Herrenschmidt
On Tue, 2014-07-22 at 11:10 +0800, Mike Qiu wrote: On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under

Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault

2014-06-29 Thread Benjamin Herrenschmidt
On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote: To achieve the above we use virtual page calss protection mechanism for covering (2) and (3). For both the above case we mark the hpte valid, but associate the page with virtual page class index 30 and 31. The authority mask register

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote: Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? No, the former uses byteswapping asm, the

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). No. Both are slightly wrong semantically but le32_to_cpu() is less wrong :-) iowrite32

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-06-24 Thread Benjamin Herrenschmidt
Is it reasonable to do error injection with CONFIG_IOMMU_API ? That means if use default config(CONFIG_IOMMU_API = n), we can not do error injection to pci devices? Well we can't pass them through either so ... In any case, this is not a priority. First we need to implement a solid error

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-06-24 Thread Benjamin Herrenschmidt
On Tue, 2014-06-24 at 14:57 +0800, Mike Qiu wrote: Is that mean *host* side error injection should base on CONFIG_IOMMU_API ? If it is just host side(no guest, no pass through), can't we do error inject? Maybe I misunderstand :) Ah no, make different patches, we don't want to use IOMMU

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-06-24 Thread Benjamin Herrenschmidt
On Wed, 2014-06-25 at 11:05 +0800, Mike Qiu wrote: Here maybe /sys/kernel/debug/powerpc/errinjct is better, because it will supply PCI_domain_nr in parameters, so no need supply errinjct for each PCI domain. Another reason is error inject not only for PCI(in future), so better not in

Re: [PATCH v1 1/3] powerpc/powernv: Sync header with firmware

2014-06-23 Thread Benjamin Herrenschmidt
On Mon, 2014-06-23 at 12:14 +1000, Gavin Shan wrote: The patch synchronizes firmware header file (opal.h) for PCI error injection The FW API you expose is not PCI specific. I haven't seen the corresponding FW patches yet but I'm not fan of that single call that collates unrelated things. I

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Thanks, Cheers, Ben. Alex === any better? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote: We can still use __raw_writelco, would that be ok? No unless you understand precisely what kind of memory barriers each platform require for these. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in

Re: [PATCH] powerpc/kvm: support to handle sw breakpoint

2014-06-17 Thread Benjamin Herrenschmidt
On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those

Re: [PATCH] powerpc/kvm: support to handle sw breakpoint

2014-06-17 Thread Benjamin Herrenschmidt
On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see

Re: [PATCH] powerpc/kvm: support to handle sw breakpoint

2014-06-17 Thread Benjamin Herrenschmidt
On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 17:25 +1000, Alexey Kardashevskiy wrote: +This creates a virtual TCE (translation control entry) table, which +is an IOMMU for PAPR-style virtual I/O. It is used to translate +logical addresses used in virtual I/O into guest physical addresses, +and provides a

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 19:26 +1000, Alexey Kardashevskiy wrote: No trees yet. For 64GB window we need (6430)/(1620)*8 = 32K TCE table. Do we really need trees? The above is assuming hugetlbfs backed guests. These are the least of my worry indeed. But we need to deal with 4k and 64k guests.

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote: What if we ask user space to give us a pointer to user space allocated memory along with the TCE registration? We would still ask user space to only use the returned fd for TCE modifications, but would have some nicely swappable

Re: [PATCH v8 2/3] powerpc/eeh: EEH support for VFIO PCI device

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 16:36 +1000, Gavin Shan wrote: +#define EEH_OPT_GET_PE_ADDR0 /* Get PE addr */ +#define EEH_OPT_GET_PE_MODE1 /* Get PE mode */ I assume that's just some leftover from the previous patches :-) Don't respin just yet, let's see what other comments come

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 17:25 +1000, Alexey Kardashevskiy wrote: +This creates a virtual TCE (translation control entry) table, which +is an IOMMU for PAPR-style virtual I/O. It is used to translate +logical addresses used in virtual I/O into guest physical addresses, +and provides a

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 19:26 +1000, Alexey Kardashevskiy wrote: No trees yet. For 64GB window we need (6430)/(1620)*8 = 32K TCE table. Do we really need trees? The above is assuming hugetlbfs backed guests. These are the least of my worry indeed. But we need to deal with 4k and 64k guests.

Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows

2014-06-05 Thread Benjamin Herrenschmidt
On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote: What if we ask user space to give us a pointer to user space allocated memory along with the TCE registration? We would still ask user space to only use the returned fd for TCE modifications, but would have some nicely swappable

Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE

2014-06-03 Thread Benjamin Herrenschmidt
On Tue, 2014-06-03 at 09:45 +0200, Alexander Graf wrote: For EEH it could as well be a dumb eventfd - really just a side channel that can tell user space that something happened asynchronously :). Which the host kernel may have no way to detect without actively poking at the device (fences in

Re: powerpc/pseries: Use new defines when calling h_set_mode

2014-05-29 Thread Benjamin Herrenschmidt
On Thu, 2014-05-29 at 23:27 +0200, Alexander Graf wrote: On 29.05.14 09:45, Michael Neuling wrote: +/* Values for 2nd argument to H_SET_MODE */ +#define H_SET_MODE_RESOURCE_SET_CIABR1 +#define H_SET_MODE_RESOURCE_SET_DAWR2 +#define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3

Re: powerpc/pseries: Use new defines when calling h_set_mode

2014-05-29 Thread Benjamin Herrenschmidt
On Thu, 2014-05-29 at 23:27 +0200, Alexander Graf wrote: On 29.05.14 09:45, Michael Neuling wrote: +/* Values for 2nd argument to H_SET_MODE */ +#define H_SET_MODE_RESOURCE_SET_CIABR1 +#define H_SET_MODE_RESOURCE_SET_DAWR2 +#define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Benjamin Herrenschmidt
On Wed, 2014-05-28 at 22:49 +1000, Gavin Shan wrote: I will remove those address related macros in next revision because it's user-level bussiness, not related to host kernel any more. If the user is QEMU + guest, we need the address to identify the PE though PHB BUID could be used as same

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Benjamin Herrenschmidt
On Thu, 2014-05-29 at 10:05 +1000, Gavin Shan wrote: The log stuff is TBD and I'll figure it out later. About to what are the errors, there are a lot. Most of them are related to hardware level, for example unstable PCI link. Usually, those error bits defined in AER fatal error state

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 12:15 -0600, Alex Williamson wrote: +/* + * Reset is the major step to recover problematic PE. The following + * command helps on that. + */ +struct vfio_eeh_pe_reset { + __u32 argsz; + __u32 flags; + __u32 option; +#define

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 14:37 -0600, Alex Williamson wrote: The usual way is the driver asks for one or the other, this plumbs back into the guest EEH code which itself plumbs into the PCIe error recovery framework in Linux. So magic? Yes. The driver is expected to more or less knows what

Re: [PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-22 Thread Benjamin Herrenschmidt
On Fri, 2014-05-23 at 14:37 +1000, Gavin Shan wrote: There's no notification, the user needs to observe the return value an poll? Should we be enabling an eventfd to notify the user of the state change? Yes. The user needs to monitor the return value. we should have one notification,

Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO

2014-05-21 Thread Benjamin Herrenschmidt
On Wed, 2014-05-21 at 08:23 +0200, Alexander Graf wrote: Note to Alex: This definitely kills the notifier idea for now though, at least as a first class citizen of the design. We can add it as an optional optimization on top later. I don't think it does. The notifier would just get

Re: [PATCH v5 3/4] drivers/vfio: EEH support for VFIO PCI device

2014-05-21 Thread Benjamin Herrenschmidt
On Wed, 2014-05-21 at 15:07 +0200, Alexander Graf wrote: +#ifdef CONFIG_VFIO_PCI_EEH +int eeh_vfio_open(struct pci_dev *pdev) Why vfio? Also that config option will not be set if vfio is compiled as a module. +{ + struct eeh_dev *edev; + + /* No PCI device ? */ + if

Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE

2014-05-20 Thread Benjamin Herrenschmidt
On Tue, 2014-05-20 at 21:56 +1000, Gavin Shan wrote: .../... I think what you want is an irqfd that the in-kernel eeh code notifies when it sees a failure. When such an fd exists, the kernel skips its own error handling. Yeah, it's a good idea and something for me to improve in phase

Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE

2014-05-20 Thread Benjamin Herrenschmidt
On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: Instead of if (passed_flag) return; you would do if (trigger_irqfd) { trigger_irqfd(); return; } which would be a much nicer, generic interface. But that's not how PAPR works. Cheers, Ben. -- To

Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE

2014-05-20 Thread Benjamin Herrenschmidt
On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: So how about we just implement this whole thing properly as irqfd? Whether QEMU can actually do anything with the interrupt is a different question - we can leave it be for now. But we could model all the code with the assumption that

Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO

2014-05-20 Thread Benjamin Herrenschmidt
On Tue, 2014-05-20 at 14:25 +0200, Alexander Graf wrote: - Move eeh-vfio.c to drivers/vfio/pci/ - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call Hrm, I think it'd be nicer to just export

Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO

2014-05-20 Thread Benjamin Herrenschmidt
On Tue, 2014-05-20 at 22:39 +1000, Gavin Shan wrote: Yeah. How about this? :-) - Move eeh-vfio.c to drivers/vfio/pci/ - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call Hrm, I think it'd be

Re: [PATCH 2/8] powerpc/eeh: Info to trace passed devices

2014-05-19 Thread Benjamin Herrenschmidt
On Mon, 2014-05-19 at 14:46 +0200, Alexander Graf wrote: I don't see the point of VFIO knowing about guest addresses. They are not unique across a system and the whole idea that a VFIO device has to be owned by a guest is also pretty dubious. I suppose what you really care about here is

Re: [PATCH 6/8] powerpc: Extend syscall ppc_rtas()

2014-05-19 Thread Benjamin Herrenschmidt
On Mon, 2014-05-19 at 14:55 +0200, Alexander Graf wrote: On 14.05.14 06:12, Gavin Shan wrote: Originally, syscall ppc_rtas() can be used to invoke RTAS call from user space. Utility errinjct is using it to inject various errors to the system for testing purpose. The patch intends to extend

Re: [PATCH 8/8] powerpc/powernv: Error injection infrastructure

2014-05-19 Thread Benjamin Herrenschmidt
On Mon, 2014-05-19 at 15:04 +0200, Alexander Graf wrote: On 14.05.14 06:12, Gavin Shan wrote: The patch intends to implement the error injection infrastructure for PowerNV platform. The predetermined handlers will be called according to the type of injected error (e.g.

Re: [PATCH 3/8] drivers/vfio: New IOCTL command VFIO_EEH_INFO

2014-05-19 Thread Benjamin Herrenschmidt
for those PCI devices, + * which have been passed through from host to guest via VFIO. So this + * file is naturally part of VFIO implementation on PowerNV platform. + * + * Copyright Benjamin Herrenschmidt Gavin Shan, IBM Corporation 2014. + * + * This program is free software; you

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 08:56 +0200, Alexander Graf wrote: For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. Yes, start off

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote: On 06.05.14 02:06, Benjamin Herrenschmidt wrote: On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote: Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote: So if I understand this patch correctly, it simply introduces logic to handle page sizes other than 4k, 64k, 16M by analyzing the actual page size field in the HPTE. Mind to explain why exactly that enables us to use THP? What

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 21:38 +0530, Aneesh Kumar K.V wrote: I updated the commit message as below. Let me know if this is ok. KVM: PPC: BOOK3S: HV: THP support for guest This has nothing to do with THP. THP support in guest depend on KVM advertising MPSS feature. We already have

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 08:56 +0200, Alexander Graf wrote: For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. Yes, start off

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote: On 06.05.14 02:06, Benjamin Herrenschmidt wrote: On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote: Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote: So if I understand this patch correctly, it simply introduces logic to handle page sizes other than 4k, 64k, 16M by analyzing the actual page size field in the HPTE. Mind to explain why exactly that enables us to use THP? What

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-06 Thread Benjamin Herrenschmidt
On Tue, 2014-05-06 at 21:38 +0530, Aneesh Kumar K.V wrote: I updated the commit message as below. Let me know if this is ok. KVM: PPC: BOOK3S: HV: THP support for guest This has nothing to do with THP. THP support in guest depend on KVM advertising MPSS feature. We already have

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Benjamin Herrenschmidt
On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote: Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest,

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Benjamin Herrenschmidt
On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote: Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails, no? Possibly but the fact remains, this can be avoided by making sure that if we create a CMA reserve for KVM, then

  1   2   3   4   5   6   7   >