Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-03 Thread Ian Campbell
On Wed, 2015-06-03 at 12:55 +0530, Vijay Kilari wrote:
> On Mon, Jun 1, 2015 at 5:54 PM, Julien Grall  wrote:
> > On 01/06/15 13:11, Ian Campbell wrote:
>  ### Device ID (`ID`)
> 
>  This parameter is used by commands which manage a specific device and
>  the interrupts associated with that device. Checking if a device is
>  present and retrieving the data structure must be fast.
> 
>  The device identifiers may not be assigned contiguously and the maximum
>  number is very high (2^32).
> 
>  XXX In the context of virtualised device ids this may not be the case,
>  e.g. we can arrange for (mostly) contiguous device ids and we know the
>  bound is significantly lower than 2^32
> 
>  Possible efficient data structures would be:
> 
>  1. List: The lookup/deletion is in O(n) and the insertion will depend
> if the device should be sorted following their identifier. The
> memory overhead is 18 bytes per element.
>  2. Red-black tree: All the operations are O(log(n)). The memory
> overhead is 24 bytes per element.
> 
> How about using radix-tree instead of RB-tree?
> 
> 
>  A Red-black tree seems the more suitable for having fast deviceID
>  validation even though the memory overhead is a bit higher compare to
>  the list.
> >>>
> >>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> >>> and other needed structure for this device is allocated added to RB-tree
> >>> with all necessary information
> >>
> >> Sounds like a reasonable time to do it. I added something based on your
> >> words.
> >
> > Hmmm... The RB-tree suggested is per domain not the host and indexed
> > with the vDevID.
> >
> > This is the only way to know quickly if the domain is able to use the
> > device and retrieving a device. Indeed, the vDevID won't be equal to the
> > pDevID as the vBDF will be different to the pBDF.
> 
> Yes, vBDF is converted to pBDF to match DevID
> 
> >
> > PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> > time we don't know to which domain the device will be passthrough.
> 
> PHYSDEVOP_pci_device_add will only add its_device to global radix tree list.
> 
> When MAPD is received, its_device is removed from global list and added
> to per domain list. When domain releases the device, its_device is added back
> to global list. is it ok?

I suspect we might need two list (or tree) entries for each its_device,
one for the pDevice mapping and one for the vDevice mapping. We may even
want a third for vCollection membership, I'm not sure.

Either way I don't think it'll be a big deal, the need or not for each
of those will fall out in the wash from the rest of the design, I think.

Based on the amount of discussion on draftC and the fact that we are
still finding new areas of complexity I'm going to take a step back and
try something simpler and see if I can come up with something which we
can get done for 4.6. I'll try and get a new draft reflecting that out
ASAP.

(I have my edits from the feedback on draftC so far in git, so if it
doesn't work we can always take up this one again...)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-03 Thread Vijay Kilari
On Mon, Jun 1, 2015 at 5:54 PM, Julien Grall  wrote:
> On 01/06/15 13:11, Ian Campbell wrote:
 ### Device ID (`ID`)

 This parameter is used by commands which manage a specific device and
 the interrupts associated with that device. Checking if a device is
 present and retrieving the data structure must be fast.

 The device identifiers may not be assigned contiguously and the maximum
 number is very high (2^32).

 XXX In the context of virtualised device ids this may not be the case,
 e.g. we can arrange for (mostly) contiguous device ids and we know the
 bound is significantly lower than 2^32

 Possible efficient data structures would be:

 1. List: The lookup/deletion is in O(n) and the insertion will depend
if the device should be sorted following their identifier. The
memory overhead is 18 bytes per element.
 2. Red-black tree: All the operations are O(log(n)). The memory
overhead is 24 bytes per element.

How about using radix-tree instead of RB-tree?


 A Red-black tree seems the more suitable for having fast deviceID
 validation even though the memory overhead is a bit higher compare to
 the list.
>>>
>>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
>>> and other needed structure for this device is allocated added to RB-tree
>>> with all necessary information
>>
>> Sounds like a reasonable time to do it. I added something based on your
>> words.
>
> Hmmm... The RB-tree suggested is per domain not the host and indexed
> with the vDevID.
>
> This is the only way to know quickly if the domain is able to use the
> device and retrieving a device. Indeed, the vDevID won't be equal to the
> pDevID as the vBDF will be different to the pBDF.

Yes, vBDF is converted to pBDF to match DevID

>
> PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> time we don't know to which domain the device will be passthrough.

PHYSDEVOP_pci_device_add will only add its_device to global radix tree list.

When MAPD is received, its_device is removed from global list and added
to per domain list. When domain releases the device, its_device is added back
to global list. is it ok?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-02 Thread Ian Campbell
On Tue, 2015-06-02 at 11:46 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 01/06/15 14:36, Ian Campbell wrote:
> > On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:
> >> Hi Vijay,
> >>
> >> On 27/05/15 17:44, Vijay Kilari wrote:
>  ## Command Translation
> 
>  Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
>  potentially time consuming commands as these commands creates entry in
>  the Xen ITS structures, which are used to validate other ITS commands.
> 
>  `INVALL` and `SYNC` are global and potentially disruptive to other
>  guests and so need consideration.
> 
>  All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
>  just validate and generate physical command.
> 
>  ### `MAPC` command translation
> 
>  Format: `MAPC vCID, vTA`
> 
> >>>-  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always 
> >>> represents
> >>>   vcpu number. Hence vTA is validated against physical Collection
> >>> IDs by querying
> >>>   ITS driver and corresponding Physical Collection ID is retrieved.
> >>>-  Each vITS will have cid_map (struct cid_mapping) which holds 
> >>> mapping of
> >>
> >> Why do you speak about each vITS? The emulation is only related to one
> >> vITS and not shared...
> >
> > And each vITS will have a cid_map, which is used. This seems like a
> > reasonable way to express this concept in the context.
> 
> This is rather strange when everything in the command emulation is per-vits.

I'm afraid you are going to have to say more explicitly what you find
strange here.

> > Perhaps there is a need to include discussion of some of the secondary
> > data structures alongside the defintion `cits_cq`. In which case we
> > could talk about "its associated `cid_map`" and things.
> >
> >>>   Virtual Collection ID(vCID), Virtual Target address(vTA) and
> >>>   Physical Collection ID (pCID).
> >>>   If vCID entry already exists in cid_map, then that particular
> >>> mapping is updated with
> >>>   the new pCID and vTA else new entry is made in cid_map
> >>
> >> When you move a collection, you also have to make sure that all the
> >> interrupts associated to it will be delivered to the new target.
> >>
> >> I'm not sure what you are suggesting for that...
> >
> > This is going to be rather painful I fear.
> >
> >>>-  MAPC pCID, pTA physical ITS command is generated
> >>
> >> We should not send any MAPC command to the physical ITS. The collection
> >> is already mapped during Xen boot and the guest should not be able to
> >> move the physical collection (they are shared between all the guests and
> >> Xen).
> >
> > This needs discussion in the background section, to describe the
> > physical setup which the virtual stuff can make assumption of.
> 
> I don't think this is a background section. The physical number of
> collection is limited (the mandatory number of collections is nr_cpus +
> 1). Those collection will likely be shared between Xen and the different
> guests.

Right, and this needs to be explained in the document as an assumption
upon which other things can draw, so that the document is (so far as
possible) a coherent whole...

> If we let the guest moving the physical collection we will also move all
> the interrupts which is wrong.

... and therefore things like this would become apparent.

>  - `MAPC pCID, pTA` physical ITS command is generated
> 
>  ### `MAPD` Command translation
> 
>  Format: `MAPD device, Valid, ITT IPA, ITT Size`
> 
>  `MAPD` is sent with `Valid` bit set if device needs to be added and reset
>  when device is removed.
> 
>  If `Valid` bit is set:
> 
>  - Allocate memory for `its_device` struct
>  - Validate ITT IPA & ITT size and update its_device struct
>  - Find number of vectors(nrvecs) for this device by querying PCI
>    helper function
>  - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
>  - Allocate memory for `struct vlpi_map` for this device. This
>    `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
>  - Find physical ITS node with which this device is associated
>  - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
>  - Validate ITT Size
>  - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> 
>  Here the overhead is with memory allocation for `its_device` and 
>  `vlpi_map`
> 
>  XXX Suggestion was to preallocate some of those at device passthrough
>  setup time?
> >>>
> >>> If Validation bit is set:
> >>>- Query its_device tree and get its_device structure for this device.
> >>>- (XXX: If pci device is hidden from dom0, does this device is added
> >>>with PHYSDEVOP_pci_device_add hypercall?)
> >>>- If device does not exists return
> >>>- If device exists in RB-tree then
> >>>   - Validate ITT IPA & ITT size and update its

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-02 Thread Julien Grall

Hi Ian,

On 01/06/15 14:36, Ian Campbell wrote:

On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:

Hi Vijay,

On 27/05/15 17:44, Vijay Kilari wrote:

## Command Translation

Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
potentially time consuming commands as these commands creates entry in
the Xen ITS structures, which are used to validate other ITS commands.

`INVALL` and `SYNC` are global and potentially disruptive to other
guests and so need consideration.

All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
just validate and generate physical command.

### `MAPC` command translation

Format: `MAPC vCID, vTA`


   -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
  vcpu number. Hence vTA is validated against physical Collection
IDs by querying
  ITS driver and corresponding Physical Collection ID is retrieved.
   -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of


Why do you speak about each vITS? The emulation is only related to one
vITS and not shared...


And each vITS will have a cid_map, which is used. This seems like a
reasonable way to express this concept in the context.


This is rather strange when everything in the command emulation is per-vits.


Perhaps there is a need to include discussion of some of the secondary
data structures alongside the defintion `cits_cq`. In which case we
could talk about "its associated `cid_map`" and things.


  Virtual Collection ID(vCID), Virtual Target address(vTA) and
  Physical Collection ID (pCID).
  If vCID entry already exists in cid_map, then that particular
mapping is updated with
  the new pCID and vTA else new entry is made in cid_map


When you move a collection, you also have to make sure that all the
interrupts associated to it will be delivered to the new target.

I'm not sure what you are suggesting for that...


This is going to be rather painful I fear.


   -  MAPC pCID, pTA physical ITS command is generated


We should not send any MAPC command to the physical ITS. The collection
is already mapped during Xen boot and the guest should not be able to
move the physical collection (they are shared between all the guests and
Xen).


This needs discussion in the background section, to describe the
physical setup which the virtual stuff can make assumption of.


I don't think this is a background section. The physical number of
collection is limited (the mandatory number of collections is nr_cpus +
1). Those collection will likely be shared between Xen and the different
guests.

If we let the guest moving the physical collection we will also move all
the interrupts which is wrong.




   Here there is no overhead, the cid_map entries are preallocated
with size of nr_cpus
   in the platform.


As said the number of collection should be at least nr_cpus + 1.


FWIW I read this as "with size appropriate for nr_cpus", which leaves
the +1 as implicit. I added the +1 nevertheless.


I wanted to make clear. His implementation was only considering nr_cpus
collections.




- `MAPC pCID, pTA` physical ITS command is generated

### `MAPD` Command translation

Format: `MAPD device, Valid, ITT IPA, ITT Size`

`MAPD` is sent with `Valid` bit set if device needs to be added and reset
when device is removed.

If `Valid` bit is set:

- Allocate memory for `its_device` struct
- Validate ITT IPA & ITT size and update its_device struct
- Find number of vectors(nrvecs) for this device by querying PCI
  helper function
- Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
- Allocate memory for `struct vlpi_map` for this device. This
  `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
- Find physical ITS node with which this device is associated
- Call `p2m_lookup` on ITT IPA addr and get physical ITT address
- Validate ITT Size
- Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`

Here the overhead is with memory allocation for `its_device` and `vlpi_map`

XXX Suggestion was to preallocate some of those at device passthrough
setup time?


If Validation bit is set:
   - Query its_device tree and get its_device structure for this device.
   - (XXX: If pci device is hidden from dom0, does this device is added
   with PHYSDEVOP_pci_device_add hypercall?)
   - If device does not exists return
   - If device exists in RB-tree then
  - Validate ITT IPA & ITT size and update its_device struct


To validate the ITT size you need to know the number of interrupt ID.


Please could you get into the habit of making concrete suggestions for
changes to the text. I've no idea what change I should make based on
this observation. If not concrete suggestions please try and make the
implications of what you are saying clear.


The size of the ITT is based on the number of Interrupt supported by the
device.

The only way to validate the size getting the number of Interrupt
before. i.e

- Find the number of MS

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-02 Thread Ian Campbell
On Mon, 2015-06-01 at 16:29 +0100, Julien Grall wrote:
> On 01/06/15 14:12, Ian Campbell wrote:
> > On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
> >> Hi Ian,
> 
> Hi Ian,
> 
> >> NIT: You used my Linaro email which I think is de-activated now :).
> > 
> > I keep finding new address books with that address  in them!
> > 
> >>> ## ITS Translation Table
> >>>
> >>> Message signalled interrupts are translated into an LPI via an ITS
> >>> translation table which must be configured for each device which can
> >>> generate an MSI.
> >>
> >> I'm not sure what is the ITS Table Table. Did you mean Interrupt
> >> Translation Table?
> > 
> > I don't think I wrote Table Table anywhere.
> 
> Sorry I meant "ITS translation table"
> 
> > I'm referring to the tables which are established by e.g. the MAPD
> > command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
> > Structure".
> 
> On previous paragraph you are referring particularly to "Interrupt
> Translation Table". This is the only table that is configured per device.

I'm afraid I'm still not getting your point. Please quote the exact text
which you think is wrong and if possible suggest an alternative.

> [..]
> 
> >>> XXX there are other aspects to virtualising the ITS (LPI collection
> >>> management, assignment of LPI ranges to guests, device
> >>> management). However these are not currently considered here. XXX
> >>> Should they be/do they need to be?
> >>
> >> I think we began to cover these aspect with the section "command 
> >> emulation".
> > 
> > Some aspects, yes. I went with:
> > 
> > There are other aspects to virtualising the ITS (LPI collection
> > management, assignment of LPI ranges to guests, device
> > management). However these are only considered here to the extent
> > needed for describing the vITS emulation.
> > 
> >>> XXX In the context of virtualised device ids this may not be the case,
> >>> e.g. we can arrange for (mostly) contiguous device ids and we know the
> >>> bound is significantly lower than 2^32
> >>
> >> Well, the deviceID is computed from the BDF and some DMA alias. As the
> >> algorithm can't be tweaked, it's very likely that we will have
> >> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
> >> (drivers/pci/search.c).
> > 
> > The implication here is that deviceID is fixed in hardware and is used
> > by driver domain software in contexts where we do not get the
> > opportunity to translate is that right? What contexts are those?
> 
> No, the driver domain software will always use a virtual DeviceID (based
> on the vBDF and other things). The problem I wanted to raise is how to
> translate back the vDeviceID to a physical deviceID/BDF.

Right, so this goes back to my original point, which is that if we
completely control the translation from vDeviceID to pDeviceID/BDF then
the vDeviceId space need not be sparse and need not utilise the entire
2^32 space, at least for domU uses.

> > Note that the BDF is also something which we could in principal
> > virtualise (we already do for domU). Perhaps that is infeasible for dom0
> > though?
> 
> For DOM0 the virtual BDF is equal to the physical BDF. So the both
> deviceID (physical and virtual) will be the same.
> 
> We may decide to do vBDF == pBDF for guest too in order to simplify the
> code.

It seems to me that choosing vBDF such that the vDeviceId space is to
our liking would be a good idea.

> > That gives me two thoughts.
> > 
> > The first is that although device identifiers are not necessarily
> > contiguous, they are generally at least grouped and not allocated at
> > random through the 2^32 options. For example a PCI Host bridge typically
> > has a range of device ids associated with it and each device has a
> > device id derived from that.
> 
> Usually it's one per (device, function).

Yes, but my point is that they are generally grouped by bus. The bus is
assigned a (contiguous) range and individual (device,function)=> device
id mappings are based on a formula applied to the base address.

i.e. for a given PCI bus the device ids are in the range 1000..1000+N,
not N random number selected from the 2^32 space.

> 
> > 
> > I'm not sure if we can leverage that into a more useful data structure
> > than an R-B tree, or for example to arrange for the R-B to allow for the
> > translation of a device within a span into the parent span and from
> > there do the lookup. Specifically when looking up a device ID
> > corresponding to a PCI device we could arrange to find the PCI host
> > bridge and find the actual device from there. This would keep the RB
> > tree much smaller and therefore perhaps quicker? Of course that depends
> > on what the lookup from PCI host bridge to a device looked like.
> 
> I'm not sure why you are speaking about PCI host bridge. AFAIK, the
> guest doesn't have a physical host bridge.

It has a virtual one provided by the pciif/pcifront+back thing. Any PCI
bus is behind some sort of 

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Julien Grall
On 01/06/15 14:12, Ian Campbell wrote:
> On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
>> Hi Ian,

Hi Ian,

>> NIT: You used my Linaro email which I think is de-activated now :).
> 
> I keep finding new address books with that address  in them!
> 
>>> ## ITS Translation Table
>>>
>>> Message signalled interrupts are translated into an LPI via an ITS
>>> translation table which must be configured for each device which can
>>> generate an MSI.
>>
>> I'm not sure what is the ITS Table Table. Did you mean Interrupt
>> Translation Table?
> 
> I don't think I wrote Table Table anywhere.

Sorry I meant "ITS translation table"

> I'm referring to the tables which are established by e.g. the MAPD
> command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
> Structure".

On previous paragraph you are referring particularly to "Interrupt
Translation Table". This is the only table that is configured per device.

[..]

>>> XXX there are other aspects to virtualising the ITS (LPI collection
>>> management, assignment of LPI ranges to guests, device
>>> management). However these are not currently considered here. XXX
>>> Should they be/do they need to be?
>>
>> I think we began to cover these aspect with the section "command emulation".
> 
> Some aspects, yes. I went with:
> 
> There are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are only considered here to the extent
> needed for describing the vITS emulation.
> 
>>> XXX In the context of virtualised device ids this may not be the case,
>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
>>> bound is significantly lower than 2^32
>>
>> Well, the deviceID is computed from the BDF and some DMA alias. As the
>> algorithm can't be tweaked, it's very likely that we will have
>> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
>> (drivers/pci/search.c).
> 
> The implication here is that deviceID is fixed in hardware and is used
> by driver domain software in contexts where we do not get the
> opportunity to translate is that right? What contexts are those?

No, the driver domain software will always use a virtual DeviceID (based
on the vBDF and other things). The problem I wanted to raise is how to
translate back the vDeviceID to a physical deviceID/BDF.

> Note that the BDF is also something which we could in principal
> virtualise (we already do for domU). Perhaps that is infeasible for dom0
> though?

For DOM0 the virtual BDF is equal to the physical BDF. So the both
deviceID (physical and virtual) will be the same.

We may decide to do vBDF == pBDF for guest too in order to simplify the
code.

> That gives me two thoughts.
> 
> The first is that although device identifiers are not necessarily
> contiguous, they are generally at least grouped and not allocated at
> random through the 2^32 options. For example a PCI Host bridge typically
> has a range of device ids associated with it and each device has a
> device id derived from that.

Usually it's one per (device, function).

> 
> I'm not sure if we can leverage that into a more useful data structure
> than an R-B tree, or for example to arrange for the R-B to allow for the
> translation of a device within a span into the parent span and from
> there do the lookup. Specifically when looking up a device ID
> corresponding to a PCI device we could arrange to find the PCI host
> bridge and find the actual device from there. This would keep the RB
> tree much smaller and therefore perhaps quicker? Of course that depends
> on what the lookup from PCI host bridge to a device looked like.

I'm not sure why you are speaking about PCI host bridge. AFAIK, the
guest doesn't have a physical host bridge.

Although, this is an optimization that we can think about it later. The
R-B will already be fast enough for a first implementation. My main
point was about the translation vDeviceID => pDeviceID.

> The second is that perhaps we can do something simpler for the domU
> case, if we were willing to tolerate it being different from dom0.
> 
>>> Possible efficient data structures would be:
>>>
>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>>> if the device should be sorted following their identifier. The
>>> memory overhead is 18 bytes per element.
>>> 2. Red-black tree: All the operations are O(log(n)). The memory
>>> overhead is 24 bytes per element.
>>>
>>> A Red-black tree seems the more suitable for having fast deviceID
>>> validation even though the memory overhead is a bit higher compare to
>>> the list.
>>>
>>> ### Event ID (`vID`)
>>>
>>> This is the per-device Interrupt identifier (i.e. the MSI index). It
>>> is configured by the device driver software.
>>>
>>> It is not necessary to translate a `vID`, however they may need to be
>>> represented in various data structures given to the pITS.
>>>
>>> XXX i

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Ian Campbell
On Mon, 2015-06-01 at 13:24 +0100, Julien Grall wrote:
> On 01/06/15 13:11, Ian Campbell wrote:
> >>> ### Device ID (`ID`)
> >>>
> >>> This parameter is used by commands which manage a specific device and
> >>> the interrupts associated with that device. Checking if a device is
> >>> present and retrieving the data structure must be fast.
> >>>
> >>> The device identifiers may not be assigned contiguously and the maximum
> >>> number is very high (2^32).
> >>>
> >>> XXX In the context of virtualised device ids this may not be the case,
> >>> e.g. we can arrange for (mostly) contiguous device ids and we know the
> >>> bound is significantly lower than 2^32
> >>>
> >>> Possible efficient data structures would be:
> >>>
> >>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >>>if the device should be sorted following their identifier. The
> >>>memory overhead is 18 bytes per element.
> >>> 2. Red-black tree: All the operations are O(log(n)). The memory
> >>>overhead is 24 bytes per element.
> >>>
> >>> A Red-black tree seems the more suitable for having fast deviceID
> >>> validation even though the memory overhead is a bit higher compare to
> >>> the list.
> >>
> >> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> >> and other needed structure for this device is allocated added to RB-tree
> >> with all necessary information
> > 
> > Sounds like a reasonable time to do it. I added something based on your
> > words.
> 
> Hmmm... The RB-tree suggested is per domain not the host and indexed
> with the vDevID.

I added "The `ID` is per domain and therefore the datastructure should
be too." before "Possible efficient..."

> This is the only way to know quickly if the domain is able to use the
> device and retrieving a device. Indeed, the vDevID won't be equal to the
> pDevID as the vBDF will be different to the pBDF.
> 
> PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> time we don't know to which domain the device will be passthrough.

Yes, I suppose we can allocate at PHYSDEVOP_pci_device_add time, but
linking it into the R-B tree will have to happen at assignment time.

This section now ends:

When `PHYSDEVOP_pci_device_add` is called, memory for its_device
structure and other needed structure for this device is allocated.

When `XEN_DOMCTL_assign_device` is called the device will be added to
the per domain RB-tree with all necessary information.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Ian Campbell
On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:
> Hi Vijay,
> 
> On 27/05/15 17:44, Vijay Kilari wrote:
> >> ## Command Translation
> >>
> >> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> >> potentially time consuming commands as these commands creates entry in
> >> the Xen ITS structures, which are used to validate other ITS commands.
> >>
> >> `INVALL` and `SYNC` are global and potentially disruptive to other
> >> guests and so need consideration.
> >>
> >> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> >> just validate and generate physical command.
> >>
> >> ### `MAPC` command translation
> >>
> >> Format: `MAPC vCID, vTA`
> >>
> >-  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
> >   vcpu number. Hence vTA is validated against physical Collection
> > IDs by querying
> >   ITS driver and corresponding Physical Collection ID is retrieved.
> >-  Each vITS will have cid_map (struct cid_mapping) which holds mapping 
> > of
> 
> Why do you speak about each vITS? The emulation is only related to one
> vITS and not shared...

And each vITS will have a cid_map, which is used. This seems like a
reasonable way to express this concept in the context.

Perhaps there is a need to include discussion of some of the secondary
data structures alongside the defintion `cits_cq`. In which case we
could talk about "its associated `cid_map`" and things.

> >   Virtual Collection ID(vCID), Virtual Target address(vTA) and
> >   Physical Collection ID (pCID).
> >   If vCID entry already exists in cid_map, then that particular
> > mapping is updated with
> >   the new pCID and vTA else new entry is made in cid_map
> 
> When you move a collection, you also have to make sure that all the
> interrupts associated to it will be delivered to the new target.
> 
> I'm not sure what you are suggesting for that...

This is going to be rather painful I fear.

> >-  MAPC pCID, pTA physical ITS command is generated
> 
> We should not send any MAPC command to the physical ITS. The collection
> is already mapped during Xen boot and the guest should not be able to
> move the physical collection (they are shared between all the guests and
> Xen).

This needs discussion in the background section, to describe the
physical setup which the virtual stuff can make assumption of.

> >Here there is no overhead, the cid_map entries are preallocated
> > with size of nr_cpus
> >in the platform.
> 
> As said the number of collection should be at least nr_cpus + 1.

FWIW I read this as "with size appropriate for nr_cpus", which leaves
the +1 as implicit. I added the +1 nevertheless.

> >> - `MAPC pCID, pTA` physical ITS command is generated
> >>
> >> ### `MAPD` Command translation
> >>
> >> Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >>
> >> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> >> when device is removed.
> >>
> >> If `Valid` bit is set:
> >>
> >> - Allocate memory for `its_device` struct
> >> - Validate ITT IPA & ITT size and update its_device struct
> >> - Find number of vectors(nrvecs) for this device by querying PCI
> >>   helper function
> >> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> >> - Allocate memory for `struct vlpi_map` for this device. This
> >>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> >> - Find physical ITS node with which this device is associated
> >> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> >> - Validate ITT Size
> >> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> >>
> >> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >>
> >> XXX Suggestion was to preallocate some of those at device passthrough
> >> setup time?
> > 
> > If Validation bit is set:
> >- Query its_device tree and get its_device structure for this device.
> >- (XXX: If pci device is hidden from dom0, does this device is added
> >with PHYSDEVOP_pci_device_add hypercall?)
> >- If device does not exists return
> >- If device exists in RB-tree then
> >   - Validate ITT IPA & ITT size and update its_device struct
> 
> To validate the ITT size you need to know the number of interrupt ID.

Please could you get into the habit of making concrete suggestions for
changes to the text. I've no idea what change I should make based on
this observation. If not concrete suggestions please try and make the
implications of what you are saying clear.


> 
> >   - Check if device is already assigned to the domain,
> > if not then
> >- Find number of vectors(nrvecs) for this device.
> >- Allocate nrvecs number of LPI
> >- Fetch vlpi_map for this device (preallocated at the
> > time of adding
> >  this device to Xen). This vlpi_map holds mapping of
> > Virtual LPI to
> >  

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Ian Campbell
On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
> Hi Ian,
> 
> NIT: You used my Linaro email which I think is de-activated now :).

I keep finding new address books with that address  in them!

> > ## ITS Translation Table
> >
> > Message signalled interrupts are translated into an LPI via an ITS
> > translation table which must be configured for each device which can
> > generate an MSI.
> 
> I'm not sure what is the ITS Table Table. Did you mean Interrupt
> Translation Table?

I don't think I wrote Table Table anywhere.

I'm referring to the tables which are established by e.g. the MAPD
command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
Structure".

> > is _not_ guarenteed that a change to the LPI Configuration Table won't
> 
> s/guarenteed/guaranteed/? Or may the first use of this word was wrong?

guaranteed is correct, I can never remember it though.

> > XXX there are other aspects to virtualising the ITS (LPI collection
> > management, assignment of LPI ranges to guests, device
> > management). However these are not currently considered here. XXX
> > Should they be/do they need to be?
> 
> I think we began to cover these aspect with the section "command emulation".

Some aspects, yes. I went with:

There are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests, device
management). However these are only considered here to the extent
needed for describing the vITS emulation.

> > XXX In the context of virtualised device ids this may not be the case,
> > e.g. we can arrange for (mostly) contiguous device ids and we know the
> > bound is significantly lower than 2^32
> 
> Well, the deviceID is computed from the BDF and some DMA alias. As the
> algorithm can't be tweaked, it's very likely that we will have
> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
> (drivers/pci/search.c).

The implication here is that deviceID is fixed in hardware and is used
by driver domain software in contexts where we do not get the
opportunity to translate is that right? What contexts are those?

Note that the BDF is also something which we could in principal
virtualise (we already do for domU). Perhaps that is infeasible for dom0
though?

That gives me two thoughts.

The first is that although device identifiers are not necessarily
contiguous, they are generally at least grouped and not allocated at
random through the 2^32 options. For example a PCI Host bridge typically
has a range of device ids associated with it and each device has a
device id derived from that.

I'm not sure if we can leverage that into a more useful data structure
than an R-B tree, or for example to arrange for the R-B to allow for the
translation of a device within a span into the parent span and from
there do the lookup. Specifically when looking up a device ID
corresponding to a PCI device we could arrange to find the PCI host
bridge and find the actual device from there. This would keep the RB
tree much smaller and therefore perhaps quicker? Of course that depends
on what the lookup from PCI host bridge to a device looked like.

The second is that perhaps we can do something simpler for the domU
case, if we were willing to tolerate it being different from dom0.

> > Possible efficient data structures would be:
> >
> > 1. List: The lookup/deletion is in O(n) and the insertion will depend
> > if the device should be sorted following their identifier. The
> > memory overhead is 18 bytes per element.
> > 2. Red-black tree: All the operations are O(log(n)). The memory
> > overhead is 24 bytes per element.
> >
> > A Red-black tree seems the more suitable for having fast deviceID
> > validation even though the memory overhead is a bit higher compare to
> > the list.
> >
> > ### Event ID (`vID`)
> >
> > This is the per-device Interrupt identifier (i.e. the MSI index). It
> > is configured by the device driver software.
> >
> > It is not necessary to translate a `vID`, however they may need to be
> > represented in various data structures given to the pITS.
> >
> > XXX is any of this true?
> 
> 
> Right, the vID will always be equal to the pID. Although you will need
> to associate a physical LPI for every pair (vID, DevID).

I think in the terms defined by this document that is (`ID`, `vID`) =>
an LPI. Right?

Have we considered how this mapping will be tracked?
 
> > ### Interrupt Collection (`vCID`)
> >
> > This parameter is used in commands which manage collections and
> > interrupt in order to move them for one CPU to another. The ITS is
> > only mandated to implement N + 1 collections where N is the number of
> > processor on the platform (i.e max number of VCPUs for a given
> > guest). Furthermore, the identifiers are always contiguous.
> >
> > If we decide to implement the strict minimum (i.e N + 1), an array is
> > enough and will allow operations in O(1).
> >
> > XXX Could forgo array and go straight to vcpu_info/d

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Julien Grall
On 01/06/15 13:11, Ian Campbell wrote:
>>> ### Device ID (`ID`)
>>>
>>> This parameter is used by commands which manage a specific device and
>>> the interrupts associated with that device. Checking if a device is
>>> present and retrieving the data structure must be fast.
>>>
>>> The device identifiers may not be assigned contiguously and the maximum
>>> number is very high (2^32).
>>>
>>> XXX In the context of virtualised device ids this may not be the case,
>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
>>> bound is significantly lower than 2^32
>>>
>>> Possible efficient data structures would be:
>>>
>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>>>if the device should be sorted following their identifier. The
>>>memory overhead is 18 bytes per element.
>>> 2. Red-black tree: All the operations are O(log(n)). The memory
>>>overhead is 24 bytes per element.
>>>
>>> A Red-black tree seems the more suitable for having fast deviceID
>>> validation even though the memory overhead is a bit higher compare to
>>> the list.
>>
>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
>> and other needed structure for this device is allocated added to RB-tree
>> with all necessary information
> 
> Sounds like a reasonable time to do it. I added something based on your
> words.

Hmmm... The RB-tree suggested is per domain not the host and indexed
with the vDevID.

This is the only way to know quickly if the domain is able to use the
device and retrieving a device. Indeed, the vDevID won't be equal to the
pDevID as the vBDF will be different to the pBDF.

PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
time we don't know to which domain the device will be passthrough.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-06-01 Thread Ian Campbell
On Wed, 2015-05-27 at 22:14 +0530, Vijay Kilari wrote:
> > ## pITS Scheduling
> >
> > A pITS scheduling pass is attempted:
> >
> > * On write to any virtual `CWRITER` iff that write results in there
> >   being new outstanding requests for that vits;
> 
>You mean, scheduling pass (softirq trigger)  is triggered iff there is no
> ongoing requests from that vits?

Yes, this has changed with the switch to only a single outstanding
batch. I went with:

* On write to any virtual `CWRITER` iff that write results in there
  being new outstanding requests for that vits which could be consumed
  by the pits (i.e. subject to only a single batch only being
  permitted by the scheduler);


Although implementationwise it may be OK to defer that decision to the
scheduler, rather than try to figure it out in the mmio trap.

> 
> > * On read from a virtual `CREADR` iff there are commands outstanding
> >   on that vits;
> > * On receipt of an interrupt notification arising from Xen's own use
> >   of `INT`; (see discussion under Completion)
> > * On any interrupt injection arising from a guests use of the `INT`
> >   command; (XXX perhaps, see discussion under Completion)
> >
> > This may result in lots of contention on the scheduler
> > locking. Therefore we consider that in each case all which happens is
> > triggering of a softirq which will be processed on return to guest,
> > and just once even for multiple events.
> 
> Is it required to have all the cases to trigger scheduling pass?
> Just on CWRITER if no ongoing request and on Xen's own completion INT
> is not sufficient?

I think CREADR is needed too, so the guest sees up to date info.

And on injection arising from the guest use of INT is marked as optional
here and considered later on. Whether it is needed depends on the
decision there.

> [...]
> > The second option is likely to be preferable if the issue of selecting
> > a device ID can be addressed.
> >
> > A secondary question is when these `INT` commands should be inserted
> > into the command stream:

(Nb, this is a list of options, not a list of places where it must be
done)

> >
> > * After each batch taken from a single `vits_cq`;
> 
>Is this not enough? because Scheduling pass just sends a one batch of
> command with Xen's INT command

It is almost certainly _sufficient_, the question is more whether it is
_necessary_ or whether we can reduce the number of interrupts which are
required for correct emulation of a vits, iow can we get away with one
of the other two options.

The following text argues that only one Xen INT is needed in the stream
at any given moment.

> > ### Device ID (`ID`)
> >
> > This parameter is used by commands which manage a specific device and
> > the interrupts associated with that device. Checking if a device is
> > present and retrieving the data structure must be fast.
> >
> > The device identifiers may not be assigned contiguously and the maximum
> > number is very high (2^32).
> >
> > XXX In the context of virtualised device ids this may not be the case,
> > e.g. we can arrange for (mostly) contiguous device ids and we know the
> > bound is significantly lower than 2^32
> >
> > Possible efficient data structures would be:
> >
> > 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >if the device should be sorted following their identifier. The
> >memory overhead is 18 bytes per element.
> > 2. Red-black tree: All the operations are O(log(n)). The memory
> >overhead is 24 bytes per element.
> >
> > A Red-black tree seems the more suitable for having fast deviceID
> > validation even though the memory overhead is a bit higher compare to
> > the list.
> 
> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> and other needed structure for this device is allocated added to RB-tree
> with all necessary information

Sounds like a reasonable time to do it. I added something based on your
words.

[...]
> > Format: `MAPC vCID, vTA`
> >
>-  The GITS_TYPER.PAtype is emulated as 0.

ITYM `GITS_TYPER.PTA`?

I've updated various introductory section to reflect the decision to
emulate as 0.

> 
> > - `MAPC pCID, pTA` physical ITS command is generated
> >
> > ### `MAPD` Command translation
> >
> > Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >
> > `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> > when device is removed.
> >
> > If `Valid` bit is set:
> >
> > - Allocate memory for `its_device` struct
> > - Validate ITT IPA & ITT size and update its_device struct
> > - Find number of vectors(nrvecs) for this device by querying PCI
> >   helper function
> > - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> > - Allocate memory for `struct vlpi_map` for this device. This
> >   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> > - Find physical ITS node with which this device is associated
> > - Call `p2m_lookup

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-05-29 Thread Julien Grall
Hi Vijay,

On 27/05/15 17:44, Vijay Kilari wrote:
>> ## Command Translation
>>
>> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
>> potentially time consuming commands as these commands creates entry in
>> the Xen ITS structures, which are used to validate other ITS commands.
>>
>> `INVALL` and `SYNC` are global and potentially disruptive to other
>> guests and so need consideration.
>>
>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
>> just validate and generate physical command.
>>
>> ### `MAPC` command translation
>>
>> Format: `MAPC vCID, vTA`
>>
>-  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
>   vcpu number. Hence vTA is validated against physical Collection
> IDs by querying
>   ITS driver and corresponding Physical Collection ID is retrieved.
>-  Each vITS will have cid_map (struct cid_mapping) which holds mapping of

Why do you speak about each vITS? The emulation is only related to one
vITS and not shared...

>   Virtual Collection ID(vCID), Virtual Target address(vTA) and
>   Physical Collection ID (pCID).
>   If vCID entry already exists in cid_map, then that particular
> mapping is updated with
>   the new pCID and vTA else new entry is made in cid_map

When you move a collection, you also have to make sure that all the
interrupts associated to it will be delivered to the new target.

I'm not sure what you are suggesting for that...


>-  MAPC pCID, pTA physical ITS command is generated

We should not send any MAPC command to the physical ITS. The collection
is already mapped during Xen boot and the guest should not be able to
move the physical collection (they are shared between all the guests and
Xen).


> 
>Here there is no overhead, the cid_map entries are preallocated
> with size of nr_cpus
>in the platform.

As said the number of collection should be at least nr_cpus + 1.

> 
>> - `MAPC pCID, pTA` physical ITS command is generated
>>
>> ### `MAPD` Command translation
>>
>> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>>
>> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
>> when device is removed.
>>
>> If `Valid` bit is set:
>>
>> - Allocate memory for `its_device` struct
>> - Validate ITT IPA & ITT size and update its_device struct
>> - Find number of vectors(nrvecs) for this device by querying PCI
>>   helper function
>> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
>> - Allocate memory for `struct vlpi_map` for this device. This
>>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
>> - Find physical ITS node with which this device is associated
>> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
>> - Validate ITT Size
>> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>>
>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>>
>> XXX Suggestion was to preallocate some of those at device passthrough
>> setup time?
> 
> If Validation bit is set:
>- Query its_device tree and get its_device structure for this device.
>- (XXX: If pci device is hidden from dom0, does this device is added
>with PHYSDEVOP_pci_device_add hypercall?)
>- If device does not exists return
>- If device exists in RB-tree then
>   - Validate ITT IPA & ITT size and update its_device struct

To validate the ITT size you need to know the number of interrupt ID.

>   - Check if device is already assigned to the domain,
> if not then
>- Find number of vectors(nrvecs) for this device.
>- Allocate nrvecs number of LPI
>- Fetch vlpi_map for this device (preallocated at the
> time of adding
>  this device to Xen). This vlpi_map holds mapping of
> Virtual LPI to
>  Physical LPI and ID.
>- Call p2m_lookup on ITT IPA addr and get physical ITT address
>- Assign this device to this domain and mark as enabled
>   - If this device already exists with the domain (Domain is
> remapping the device)
>- Validate ITT IPA & ITT size and update its_device struct
>- Call p2m_lookup on ITT IPA addr and get physical ITT address
>- Disable all the LPIs of this device by searching
> through vlpi_map and LPI
>  configuration table

Disabling all the LPIs associated to a device can be time consuming
because you have to unroute them and make sure that the physical ITS
effectively disabled it before sending the MAPD command.

Given that the software would be buggy if it send a MAPD command without
releasing all the associated interrupt we could ignore the command if
any interrupt is still enabled.

> 
>   - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
> 
>>
>> If Validation bit is not set:
>>
>> - Validate if the device exits by checking vITS device li

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-05-29 Thread Julien Grall
Hi Ian,

NIT: You used my Linaro email which I think is de-activated now :).

On 27/05/2015 13:48, Ian Campbell wrote:
> Here follows draft C based on previous feedback.
>
> Also at:
>
> http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}
>
> I think I've captured most of the previous discussion, except where
> explicitly noted by XXX or in other replies, but please do point out
> places where I've missed something.
>
> One area where I am pretty sure I've dropped the ball is on the
> completion and update of `CREADR`. That conversation ended up
> bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
> manage to get the various proposals straight. Since we've now agreed on
> N:N hopefully we can reach a conclusion (no pun intended) on the
> completion aspect too (sorry that this probably means rehasing at least
> a subset of the previous thread).
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell 
> % Draft C
>
> # Changelog
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tablesd
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pITS mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> ## Device Identifiers
>
> Each device using the ITS is associated with a unique identifier.
>
> The device IDs are typically described via system firmware, e.g. the
> ACPI IORT table or via device tree.
>
> The number of device ids is variable and can be discovered via
> `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
> device.
>
> ## Interrupt Collections
>
> Each interrupt is a member of an Interrupt Collection. This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The format
> of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>processor affinity value (`MPIDR`) and the processor number is
>discoverable via `GICR_TYPER.ProcessorNumber`.
>
> ## ITS Translation Table
>
> Message signalled interrupts are translated into an LPI via an ITS
> translation table which must be configured for each device which can
> generate an MSI.

I'm not sure what is the ITS Table Table. Did you mean Interrupt
Translation Table?

>
> The ITS translation table maps the device id of the originating devic

s/devic/device/?

> into an Interrupt Collection and then into a target address.
>
> ## ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring Translation Table for each device, via an in memory ring
> shared between the CPU and the ITS controller. The ring is managed via
> the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
> `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is

Re: [Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-05-27 Thread Vijay Kilari
On Wed, May 27, 2015 at 5:18 PM, Ian Campbell  wrote:
> Here follows draft C based on previous feedback.
>
> Also at:
>
> http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}
>
> I think I've captured most of the previous discussion, except where
> explicitly noted by XXX or in other replies, but please do point out
> places where I've missed something.
>
> One area where I am pretty sure I've dropped the ball is on the
> completion and update of `CREADR`. That conversation ended up
> bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
> manage to get the various proposals straight. Since we've now agreed on
> N:N hopefully we can reach a conclusion (no pun intended) on the
> completion aspect too (sorry that this probably means rehasing at least
> a subset of the previous thread).
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell 
> % Draft C
>
> # Changelog
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tablesd
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pITS mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>   translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> ## Device Identifiers
>
> Each device using the ITS is associated with a unique identifier.
>
> The device IDs are typically described via system firmware, e.g. the
> ACPI IORT table or via device tree.
>
> The number of device ids is variable and can be discovered via
> `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
> device.
>
> ## Interrupt Collections
>
> Each interrupt is a member of an Interrupt Collection. This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The format
> of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>   processor affinity value (`MPIDR`) and the processor number is
>   discoverable via `GICR_TYPER.ProcessorNumber`.
>
> ## ITS Translation Table
>
> Message signalled interrupts are translated into an LPI via an ITS
> translation table which must be configured for each device which can
> generate an MSI.
>
> The ITS translation table maps the device id of the originating devic
> into an Interrupt Collection and then into a target address.
>
> ## ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring Translation Table for each device, via an in memory ring
> shared between the CPU and the ITS controller. The ring is managed via
> the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
> `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the inte

[Xen-devel] [Draft C] Xen on ARM vITS Handling

2015-05-27 Thread Ian Campbell
Here follows draft C based on previous feedback.

Also at:

http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}

I think I've captured most of the previous discussion, except where
explicitly noted by XXX or in other replies, but please do point out
places where I've missed something.

One area where I am pretty sure I've dropped the ball is on the
completion and update of `CREADR`. That conversation ended up
bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
manage to get the various proposals straight. Since we've now agreed on
N:N hopefully we can reach a conclusion (no pun intended) on the
completion aspect too (sorry that this probably means rehasing at least
a subset of the previous thread).

Ian.

% Xen on ARM vITS Handling
% Ian Campbell 
% Draft C

# Changelog

## Since Draft B

* Details of command translation (thanks to Julien and Vijay)
* Added background on LPI Translation and Pending tablesd
* Added background on Collections
* Settled on `N:N` scheme for vITS:pITS mapping.
* Rejigged section nesting a bit.
* Since we now thing translation should be cheap, settle on
  translation at scheduling time.
* Lazy `INVALL` and `SYNC`

## Since Draft A

* Added discussion of when/where command translation occurs.
* Contention on scheduler lock, suggestion to use SOFTIRQ.
* Handling of domain shutdown.
* More detailed discussion of multiple vs single vits pros/cons.

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. (XXX it is entirely
possible I've horribly misunderstood how this stuff fits
together). For full details of the ITS see the "GIC Architecture
Specification".

## Device Identifiers

Each device using the ITS is associated with a unique identifier.

The device IDs are typically described via system firmware, e.g. the
ACPI IORT table or via device tree.

The number of device ids is variable and can be discovered via
`GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
device.

## Interrupt Collections

Each interrupt is a member of an Interrupt Collection. This allows
software to manage large numbers of physical interrupts with a small
number of commands rather than issuing one command per interrupt.

On a system with N processors, the ITS must provide at least N+1
collections.

## Target Addresses

The Target Address correspond to a specific GIC re-distributor. The format
of this field depends on the value of the `GITS_TYPER.PTA` bit:

* 1: the base address of the re-distributor target is used
* 0: a unique processor number is used. The mapping between the
  processor affinity value (`MPIDR`) and the processor number is
  discoverable via `GICR_TYPER.ProcessorNumber`.

## ITS Translation Table

Message signalled interrupts are translated into an LPI via an ITS
translation table which must be configured for each device which can
generate an MSI.

The ITS translation table maps the device id of the originating devic
into an Interrupt Collection and then into a target address.

## ITS Configuration

The ITS is configured and managed, including establishing and
configuring Translation Table for each device, via an in memory ring
shared between the CPU and the ITS controller. The ring is managed via
the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
`GITS_CREADR` registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
consisting of the command queue. This field is 8 bits which means the
maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
there is a maximum of 32768 commands in the queue.

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enume