[added to the 3.18 stable tree] timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop

2015-04-25 Thread Sasha Levin
From: Preeti U Murthy This patch has been added to the 3.18 stable tree. If you have any objections, please let us know. === [ Upstream commit a127d2bcf1fbc8c8e0b5cf0dab54f7d3ff50ce47 ] The hrtimer mode of broadcast queues hrtimers in the idle entry path so as to wakeup cpus in dee

Re: [PATCH] ehea: Fix memory hook reference counting crashes

2015-04-25 Thread David Miller
From: Michael Ellerman Date: Fri, 24 Apr 2015 15:52:32 +1000 > The recent commit to only register the EHEA memory hotplug hooks on > adapter probe has a few problems. > > Firstly the reference counting is wrong for multiple adapters, in that > the hooks are registered multiple times. Secondly th

[PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-04-25 Thread Alexey Kardashevskiy
This extends iommu_table_group_ops by a set of callbacks to support dynamic DMA windows management. create_table() creates a TCE table with specific parameters. it receives iommu_table_group to know nodeid in order to allocate TCE table memory closer to the PHB. The exact format of allocated multi

[PATCH kernel v9 22/32] powerpc/powernv: Implement multilevel TCE tables

2015-04-25 Thread Alexey Kardashevskiy
TCE tables might get too big in case of 4K IOMMU pages and DDW enabled on huge guests (hundreds of GB of RAM) so the kernel might be unable to allocate contiguous chunk of physical memory to store the TCE table. To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables, up to 5

[PATCH kernel v9 30/32] vfio: powerpc/spapr: Use 32bit DMA window properties from table_group

2015-04-25 Thread Alexey Kardashevskiy
A table group might not have a table but it always has the default 32bit window parameters so use these. No change in behavior is expected. Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * new in the series - to make the next patch simpler --- drivers/vfio/vfio_iommu_spapr_tce.c | 19

[PATCH kernel v9 25/32] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership

2015-04-25 Thread Alexey Kardashevskiy
Before the IOMMU user (VFIO) would take control over the IOMMU table belonging to a specific IOMMU group. This approach did not allow sharing tables between IOMMU groups attached to the same container. This introduces a new IOMMU ownership flavour when the user can not just control the existing IO

[PATCH kernel v9 24/32] powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release

2015-04-25 Thread Alexey Kardashevskiy
The existing code programmed TVT#0 with some address and then immediately released that memory. This makes use of pnv_pci_ioda2_unset_window() and pnv_pci_ioda2_set_bypass() which do correct resource release and TVT update. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/powernv/

[PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table

2015-04-25 Thread Alexey Kardashevskiy
In order to support memory pre-registration, we need a way to track the use of every registered memory region and only allow unregistration if a region is not in use anymore. So we need a way to tell from what region the just cleared TCE was from. This adds a userspace view of the TCE table into i

[PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-25 Thread Alexey Kardashevskiy
At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. Another problem this patch is addressing is the use of pool locks for external IOMMU users such a

[PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache

2015-04-25 Thread Alexey Kardashevskiy
We are adding support for DMA memory pre-registration to be used in conjunction with VFIO. The idea is that the userspace which is going to run a guest may want to pre-register a user space memory region so it all gets pinned once and never goes away. Having this done, a hypervisor will not have to

[PATCH kernel v9 14/32] powerpc/iommu: Fix IOMMU ownership control functions

2015-04-25 Thread Alexey Kardashevskiy
This adds missing locks in iommu_take_ownership()/ iommu_release_ownership(). This marks all pages busy in iommu_table::it_map in order to catch errors if there is an attempt to use this table while ownership over it is taken. This only clears TCE content if there is no page marked busy in it_map

[PATCH kernel v9 12/32] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-04-25 Thread Alexey Kardashevskiy
Modern IBM POWERPC systems support multiple (currently two) TCE tables per IOMMU group (a.k.a. PE). This adds a iommu_table_group container for TCE tables. Right now just one table is supported. For P5IOC2 and IODA, iommu_table_group is embedded into PE struct (pnv_ioda_pe and pnv_phb) and does no

[PATCH kernel v9 10/32] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE

2015-04-25 Thread Alexey Kardashevskiy
Normally a bitmap from the iommu_table is used to track what TCE entry is in use. Since we are going to use iommu_table without its locks and do xchg() instead, it becomes essential not to put bits which are not implied in the direction flag as the old TCE value (more precisely - the permission bit

[PATCH kernel v9 11/32] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table

2015-04-25 Thread Alexey Kardashevskiy
This adds a iommu_table_ops struct and puts pointer to it into the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush callbacks from ppc_md to the new struct where they really belong to. This adds the requirement for @it_ops to be initialized before calling iommu_init_table() to m

[PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control

2015-04-25 Thread Alexey Kardashevskiy
This adds tce_iommu_take_ownership() and tce_iommu_release_ownership which call in a loop iommu_take_ownership()/iommu_release_ownership() for every table on the group. As there is just one now, no change in behaviour is expected. At the moment the iommu_table struct has a set_bypass() which enabl

[PATCH kernel v9 09/32] vfio: powerpc/spapr: Rework groups attaching

2015-04-25 Thread Alexey Kardashevskiy
This is to make extended ownership and multiple groups support patches simpler for review. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy [aw: for the vfio related changes] Acked-by: Alex Williamson Reviewed-by: David Gibson --- drivers/vfio/vfio_iommu_spapr_tce.

[PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-04-25 Thread Alexey Kardashevskiy
The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and syst

[PATCH kernel v9 32/32] vfio: powerpc/spapr: Support Dynamic DMA windows

2015-04-25 Thread Alexey Kardashevskiy
This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the

[PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

2015-04-25 Thread Alexey Kardashevskiy
This adds a way for the IOMMU user to know how much a new table will use so it can be accounted in the locked_vm limit before allocation happens. This stores the allocated table size in pnv_pci_create_table() so the locked_vm counter can be updated correctly when a table is being disposed. This d

[PATCH kernel v9 06/32] vfio: powerpc/spapr: Move locked_vm accounting to helpers

2015-04-25 Thread Alexey Kardashevskiy
There moves locked pages accounting to helpers. Later they will be reused for Dynamic DMA windows (DDW). This reworks debug messages to show the current value and the limit. This stores the locked pages number in the container so when unlocking the iommu table pointer won't be needed. This does n

[PATCH kernel v9 15/32] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free()

2015-04-25 Thread Alexey Kardashevskiy
The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is supposed to be called on IODA1/2 and not called on p5ioc2. It receives start and end host addresses of TCE table. IODA2 actually needs PCI addresses to invalidate the cache. Those can be calculated from host addresses but since

[PATCH kernel v9 08/32] vfio: powerpc/spapr: Moving pinning/unpinning to helpers

2015-04-25 Thread Alexey Kardashevskiy
This is a pretty mechanical patch to make next patches simpler. New tce_iommu_unuse_page() helper does put_page() now but it might skip that after the memory registering patch applied. As we are here, this removes unnecessary checks for a value returned by pfn_to_page() as it cannot possibly retu

[PATCH kernel v9 05/32] vfio: powerpc/spapr: Use it_page_size

2015-04-25 Thread Alexey Kardashevskiy
This makes use of the it_page_size from the iommu_table struct as page size can differ. This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code as recently introduced IOMMU_PAGE_XXX macros do not include IOMMU_PAGE_SHIFT. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson

[PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry

2015-04-25 Thread Alexey Kardashevskiy
This replaces direct accesses to TCE table with a helper which returns an TCE entry address. This does not make difference now but will when multi-level TCE tables get introduces. No change in behavior is expected. Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * new patch in the series to

[PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-25 Thread Alexey Kardashevskiy
At the moment only one group per container is supported. POWER8 CPUs have more flexible design and allows naving 2 TCE tables per IOMMU group so we can relax this limitation and support multiple groups per container. This adds TCE table descriptors to a container and uses iommu_table_group_ops to

[PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

2015-04-25 Thread Alexey Kardashevskiy
At the moment the DMA setup code looks for the "ibm,opal-tce-kill" property which contains the TCE kill register address. Writes to this register invalidates TCE cache on IODA/IODA2 hub. This moves the register address from iommu_table to pnv_ioda_pe as later there will be 2 tables per PE and it w

[PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

2015-04-25 Thread Alexey Kardashevskiy
This is a part of moving DMA window programming to an iommu_ops callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as a first parameter (not pnv_ioda_pe) as it is going to be used as a callback for VFIO DDW code. This adds pnv_pci_ioda2_tvt_invalidate() to invalidate TVT as it is a go

[PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table

2015-04-25 Thread Alexey Kardashevskiy
This is a part of moving TCE table allocation into an iommu_ops callback to support multiple IOMMU groups per one VFIO container. This moves a table creation window to the file with common powernv-pci helpers as it does not do anything IODA2-specific. This adds pnv_pci_free_table() helper to rele

[PATCH kernel v9 07/32] vfio: powerpc/spapr: Disable DMA mappings on disabled container

2015-04-25 Thread Alexey Kardashevskiy
At the moment DMA map/unmap requests are handled irrespective to the container's state. This allows the user space to pin memory which it might not be allowed to pin. This adds checks to MAP/UNMAP that the container is enabled, otherwise -EPERM is returned. Signed-off-by: Alexey Kardashevskiy [a

[PATCH kernel v9 19/32] powerpc/powernv/ioda2: Rework iommu_table creation

2015-04-25 Thread Alexey Kardashevskiy
This moves iommu_table creation to the beginning to make following changes easier to review. This starts using table parameters from the iommu_table struct. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * updated commit log and did minor cleanup --

[PATCH kernel v9 00/32] powerpc/iommu/vfio: Enable Dynamic DMA windows

2015-04-25 Thread Alexey Kardashevskiy
This enables sPAPR defined feature called Dynamic DMA windows (DDW). Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus where devices are allowed to do DMA. These ranges are called DMA windows. By default, there is a single DMA window, 1 or 2GB big, mapped at zero on a PC

[PATCH kernel v9 03/32] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver

2015-04-25 Thread Alexey Kardashevskiy
This moves page pinning (get_user_pages_fast()/put_page()) code out of the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs to as the platform code does not deal with page pinning. This makes iommu_take_ownership()/iommu_release_ownership() deal with the IOMMU table bitmap onl

[PATCH kernel v9 01/32] powerpc/iommu: Split iommu_free_table into 2 helpers

2015-04-25 Thread Alexey Kardashevskiy
The iommu_free_table helper release memory it is using (the TCE table and @it_map) and release the iommu_table struct as well. We might not want the very last step as we store iommu_table in parent structures. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/iommu.h | 1 + arch/

[PATCH kernel v9 02/32] Revert "powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically"

2015-04-25 Thread Alexey Kardashevskiy
This reverts commit 9e8d4a19ab66ec9e132d405357b9108a4f26efd3 as tce32_table has exactly the same life time as the whole PE. This makes use of a new iommu_reset_table() helper instead. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/iommu.h | 3 --- arch/powerpc/platfo

[PATCH kernel v9 04/32] vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page

2015-04-25 Thread Alexey Kardashevskiy
This checks that the TCE table page size is not bigger that the size of a page we just pinned and going to put its physical address to the table. Otherwise the hardware gets unwanted access to physical memory between the end of the actual page and the end of the aligned up TCE page. Since compoun

Re: [PATCH 2/2] pci: Use Qemu created PCI device nodes

2015-04-25 Thread Benjamin Herrenschmidt
On Sat, 2015-04-25 at 17:31 +1000, Alexey Kardashevskiy wrote: > We need BAR setup in 2 cases: when SLOF needs to boot from a PCI device > (and SLOF can do BAR setup) and when we do PCI hotplug - and BARs are set > by the guest, otherwise we hit races here (Michael Roth can tell more). So > as f

Re: [PATCH 2/2] pci: Use Qemu created PCI device nodes

2015-04-25 Thread Alexey Kardashevskiy
On 04/25/2015 05:30 AM, Thomas Huth wrote: Hi Nikunj, On Wed, 22 Apr 2015 16:27:20 +0530 Nikunj A Dadhania wrote: PCI Enumeration has been part of SLOF. Now with hotplug code addition in Qemu, it makes more sense to have this code a one place, i.e. Qemu. s/Qemu/QEMU/ and s/code a one pla