From: Preeti U Murthy
This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.
===
[ Upstream commit a127d2bcf1fbc8c8e0b5cf0dab54f7d3ff50ce47 ]
The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in dee
From: Michael Ellerman
Date: Fri, 24 Apr 2015 15:52:32 +1000
> The recent commit to only register the EHEA memory hotplug hooks on
> adapter probe has a few problems.
>
> Firstly the reference counting is wrong for multiple adapters, in that
> the hooks are registered multiple times. Secondly th
This extends iommu_table_group_ops by a set of callbacks to support
dynamic DMA windows management.
create_table() creates a TCE table with specific parameters.
it receives iommu_table_group to know nodeid in order to allocate
TCE table memory closer to the PHB. The exact format of allocated
multi
TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
on huge guests (hundreds of GB of RAM) so the kernel might be unable to
allocate contiguous chunk of physical memory to store the TCE table.
To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables,
up to 5
A table group might not have a table but it always has the default 32bit
window parameters so use these.
No change in behavior is expected.
Signed-off-by: Alexey Kardashevskiy
---
Changes:
v9:
* new in the series - to make the next patch simpler
---
drivers/vfio/vfio_iommu_spapr_tce.c | 19
Before the IOMMU user (VFIO) would take control over the IOMMU table
belonging to a specific IOMMU group. This approach did not allow sharing
tables between IOMMU groups attached to the same container.
This introduces a new IOMMU ownership flavour when the user can not
just control the existing IO
The existing code programmed TVT#0 with some address and then
immediately released that memory.
This makes use of pnv_pci_ioda2_unset_window() and
pnv_pci_ioda2_set_bypass() which do correct resource release and
TVT update.
Signed-off-by: Alexey Kardashevskiy
---
arch/powerpc/platforms/powernv/
In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.
This adds a userspace view of the TCE table into i
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.
Another problem this patch is addressing is the use of pool locks for
external IOMMU users such a
We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have to
This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().
This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.
This only clears TCE content if there is no page marked busy in it_map
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.
For P5IOC2 and IODA, iommu_table_group is embedded into PE struct
(pnv_ioda_pe and pnv_phb) and does no
Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag as the old TCE value (more precisely -
the permission bit
This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.
This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to m
This adds tce_iommu_take_ownership() and tce_iommu_release_ownership
which call in a loop iommu_take_ownership()/iommu_release_ownership()
for every table on the group. As there is just one now, no change in
behaviour is expected.
At the moment the iommu_table struct has a set_bypass() which enabl
This is to make extended ownership and multiple groups support patches
simpler for review.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy
[aw: for the vfio related changes]
Acked-by: Alex Williamson
Reviewed-by: David Gibson
---
drivers/vfio/vfio_iommu_spapr_tce.
The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and syst
This adds create/remove window ioctls to create and remove DMA windows.
sPAPR defines a Dynamic DMA windows capability which allows
para-virtualized guests to create additional DMA windows on a PCI bus.
The existing linux kernels use this new window to map the entire guest
memory and switch to the
This adds a way for the IOMMU user to know how much a new table will
use so it can be accounted in the locked_vm limit before allocation
happens.
This stores the allocated table size in pnv_pci_create_table()
so the locked_vm counter can be updated correctly when a table is
being disposed.
This d
There moves locked pages accounting to helpers.
Later they will be reused for Dynamic DMA windows (DDW).
This reworks debug messages to show the current value and the limit.
This stores the locked pages number in the container so when unlocking
the iommu table pointer won't be needed. This does n
The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
supposed to be called on IODA1/2 and not called on p5ioc2. It receives
start and end host addresses of TCE table.
IODA2 actually needs PCI addresses to invalidate the cache. Those
can be calculated from host addresses but since
This is a pretty mechanical patch to make next patches simpler.
New tce_iommu_unuse_page() helper does put_page() now but it might skip
that after the memory registering patch applied.
As we are here, this removes unnecessary checks for a value returned
by pfn_to_page() as it cannot possibly retu
This makes use of the it_page_size from the iommu_table struct
as page size can differ.
This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
This replaces direct accesses to TCE table with a helper which
returns an TCE entry address. This does not make difference now but will
when multi-level TCE tables get introduces.
No change in behavior is expected.
Signed-off-by: Alexey Kardashevskiy
---
Changes:
v9:
* new patch in the series to
At the moment only one group per container is supported.
POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
IOMMU group so we can relax this limitation and support multiple groups
per container.
This adds TCE table descriptors to a container and uses iommu_table_group_ops
to
At the moment the DMA setup code looks for the "ibm,opal-tce-kill" property
which contains the TCE kill register address. Writes to this register
invalidates TCE cache on IODA/IODA2 hub.
This moves the register address from iommu_table to pnv_ioda_pe as
later there will be 2 tables per PE and it w
This is a part of moving DMA window programming to an iommu_ops
callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as
a first parameter (not pnv_ioda_pe) as it is going to be used as
a callback for VFIO DDW code.
This adds pnv_pci_ioda2_tvt_invalidate() to invalidate TVT as it is
a go
This is a part of moving TCE table allocation into an iommu_ops
callback to support multiple IOMMU groups per one VFIO container.
This moves a table creation window to the file with common powernv-pci
helpers as it does not do anything IODA2-specific.
This adds pnv_pci_free_table() helper to rele
At the moment DMA map/unmap requests are handled irrespective to
the container's state. This allows the user space to pin memory which
it might not be allowed to pin.
This adds checks to MAP/UNMAP that the container is enabled, otherwise
-EPERM is returned.
Signed-off-by: Alexey Kardashevskiy
[a
This moves iommu_table creation to the beginning to make following changes
easier to review. This starts using table parameters from the iommu_table
struct.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy
---
Changes:
v9:
* updated commit log and did minor cleanup
--
This enables sPAPR defined feature called Dynamic DMA windows (DDW).
Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PC
This moves page pinning (get_user_pages_fast()/put_page()) code out of
the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
to as the platform code does not deal with page pinning.
This makes iommu_take_ownership()/iommu_release_ownership() deal with
the IOMMU table bitmap onl
The iommu_free_table helper release memory it is using (the TCE table and
@it_map) and release the iommu_table struct as well. We might not want
the very last step as we store iommu_table in parent structures.
Signed-off-by: Alexey Kardashevskiy
---
arch/powerpc/include/asm/iommu.h | 1 +
arch/
This reverts commit 9e8d4a19ab66ec9e132d405357b9108a4f26efd3 as
tce32_table has exactly the same life time as the whole PE.
This makes use of a new iommu_reset_table() helper instead.
Signed-off-by: Alexey Kardashevskiy
---
arch/powerpc/include/asm/iommu.h | 3 ---
arch/powerpc/platfo
This checks that the TCE table page size is not bigger that the size of
a page we just pinned and going to put its physical address to the table.
Otherwise the hardware gets unwanted access to physical memory between
the end of the actual page and the end of the aligned up TCE page.
Since compoun
On Sat, 2015-04-25 at 17:31 +1000, Alexey Kardashevskiy wrote:
> We need BAR setup in 2 cases: when SLOF needs to boot from a PCI device
> (and SLOF can do BAR setup) and when we do PCI hotplug - and BARs are set
> by the guest, otherwise we hit races here (Michael Roth can tell more). So
> as f
On 04/25/2015 05:30 AM, Thomas Huth wrote:
Hi Nikunj,
On Wed, 22 Apr 2015 16:27:20 +0530
Nikunj A Dadhania wrote:
PCI Enumeration has been part of SLOF. Now with hotplug code addition
in Qemu, it makes more sense to have this code a one place, i.e. Qemu.
s/Qemu/QEMU/ and s/code a one pla
37 matches
Mail list logo