Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-26 Thread David Woodhouse
On Wed, 2022-01-26 at 13:47 +0100, Jan Beulich wrote:
> On 25.01.2022 16:13, Roger Pau Monné wrote:
> > On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote:
> > > On 20.01.2022 16:23, Roger Pau Monne wrote:
> > > > Such field uses bits 55:48, but for the purposes the register
> > > > will be
> > > > used use bits 55:49 instead. Bit 48 is used to signal an RTE
> > > > entry is
> > > > in remappable format which is not supported by the vIO-APIC.
> > > 
> > > Neither here nor in the cover letter you point at a formal
> > > specification
> > > of this mode of operation.
> > 
> > I'm not aware of any formal specification of this mode, apart from
> > the
> > work done to introduce support in Linux and QEMU:
> > 
> > https://lore.kernel.org/all/20201009104616.1314746-1-dw...@infradead.org/
> > 
> > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e
> > 
> > 
> > Adding David in case there's some kind of specification somewhere
> > I'm
> > not aware of.
> > 
> > > What I'm aware of are vague indications of
> > > this mode's existence in some of Intel's chipset data sheets. Yet that
> > > leaves open, for example, whether indeed bit 48 cannot be used here.
> > 
> > Bit 48 cannot be used because it's already used to signal an RTE is in
> > remappable format. We still want to differentiate an RTE entry in
> > remappable format, as it should be possible to expose both the
> > extended ID support and an emulated IOMMU.
> 
> I think I did say so on irc already: There's not really a problem like
> this. For one I wouldn't expect an OS to use this extended ID at the
> same time as having an IOMMU to deal with the width restriction. And
> then, even if they wanted to use both at the same time, they'd simply
> need to care about the specific meaning of this bit themselves: When
> the bit is set, it would be unavoidable to have it (perhaps identity-)
> remapped by the IOMMU.

As you later said, it's too late for bikeshedding that decision. But I
stand by it regardless of the time.

Even by the time *I* made that choice, it was long since established by
Intel. You could make the same argument about their original hardware
design, that the format bit is pointless and that if an OS enables
interrupt remapping, it knows full well when it's going to use it. It
can even be configured in the IOMMU per PCI function.

There is benefit to having a very clear and unambiguous difference
between the MSI formats that isn't entirely dependent on the IOMMU
being configured correctly. And in my case there is *definitely*
benefit to following the precedent already set by Intel in the real
hardware. For me, those outweighed the marginal additional benefit of
going from 15 to 16 bits of APIC ID in the MSI.

> > > > --- a/xen/arch/x86/hvm/vioapic.c
> > > > +++ b/xen/arch/x86/hvm/vioapic.c
> > > > @@ -412,7 +412,8 @@ static void ioapic_inj_irq(
> > > >  
> > > >  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int 
> > > > pin)
> > > >  {
> > > > -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
> > > > +uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
> > > > +(vioapic->redirtbl[pin].fields.ext_dest_id << 8);
> > > 
> > > What if an existing guest has been writing non-zero in these bits? Can
> > > you really use them here without any further indication by the guest?
> > 
> > Those bits where reserved previously, so no OS should have used them.
> > There are hypervisors already in the field (QEMU/KVM and HyperV) using
> > this mode.
> > 
> > We could add a per-domain option to disable extended ID mode if we are
> > really worried about OSes having used those bits for some reason.
> 
> Generally I think previously ignored bits need to be handled with care.
> If there was a specification, what is being said there might serve as
> a guideline for us. Even if there was just a proper description of the
> EDID field found in recent Intel chipset spec, this might already help
> determining whether we want/need an enable (or disable). But there's
> not even a bit announcing the functionality in, say, the version
> register.

It's not very verbose, but the Extended Destination ID in the I/OAPIC
is at least mentioned in the RTE documentation in the 82806AA datasheet
https://datasheet.octopart.com/FW82806AA-SL3VZ-Intel-datasheet-13695406.pdf 

See page 47, §2.4.10 "Redirection Table High DWord".

The rest you have to kind of piece together from the later
documentation once they actually started *using* it for IRQ remapping.
I think it may also have been used on IA64?

The realisation that we didn't need to have special different code to
compose RTE entries for Compatibility Format vs. Remappable Format, and
that we could just allow the 'upstream' APIC code to compose the MSI
message and then swizzle the bits into the RTE... was rather slow to
come.
https://lore.kernel.org/all/20201024213535.443185-22-dw...@infradead.org/



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-26 Thread Jan Beulich
On 26.01.2022 14:52, David Woodhouse wrote:
> On Tue, 2022-01-25 at 16:13 +0100, Roger Pau Monné wrote:
>> On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote:
>>> On 20.01.2022 16:23, Roger Pau Monne wrote:
 Such field uses bits 55:48, but for the purposes the register will be
 used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
 in remappable format which is not supported by the vIO-APIC.
>>>
>>> Neither here nor in the cover letter you point at a formal specification
>>> of this mode of operation.
>>
>> I'm not aware of any formal specification of this mode, apart from the
>> work done to introduce support in Linux and QEMU:
>>
>> https://lore.kernel.org/all/20201009104616.1314746-1-dw...@infradead.org/
>>
>> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e
>>
>>
>> Adding David in case there's some kind of specification somewhere I'm
>> not aware of.
> 
> Indeed there is no formal specification that I am aware of, although
> it's vaguely possible that Microsoft wrote something up when they added
> it to Hyper-V.
> 
> https://lore.kernel.org/all/20201103011136.59108-1-de...@microsoft.com/
> 
> I had an internal doc which looks like I can clean it up a tiny bit
> and then share at http://david.woodhou.se/15-bit-msi.pdf if that helps?

Thanks, this at least puts us on common grounds.

>>> What I'm aware of are vague indications of
>>> this mode's existence in some of Intel's chipset data sheets. Yet that
>>> leaves open, for example, whether indeed bit 48 cannot be used here.
>>
>> Bit 48 cannot be used because it's already used to signal an RTE is in
>> remappable format. We still want to differentiate an RTE entry in
>> remappable format, as it should be possible to expose both the
>> extended ID support and an emulated IOMMU.
> 
> Right. I chose not to use the low bit of the existing Extended
> Destination ID because that's the one Intel used to indicate Remappable
> Format. This means we can still expose an IOMMU to guests and easily
> distinguish between Compatibility Format and Remappable Format MSIs
> just as real hardware does.

Well, with the defacto standard of using only 7 of the bits we will
have to follow suit of course.

Jan




Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-26 Thread David Woodhouse
On Tue, 2022-01-25 at 16:13 +0100, Roger Pau Monné wrote:
> On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote:
> > On 20.01.2022 16:23, Roger Pau Monne wrote:
> > > Such field uses bits 55:48, but for the purposes the register will be
> > > used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
> > > in remappable format which is not supported by the vIO-APIC.
> > 
> > Neither here nor in the cover letter you point at a formal specification
> > of this mode of operation.
> 
> I'm not aware of any formal specification of this mode, apart from the
> work done to introduce support in Linux and QEMU:
> 
> https://lore.kernel.org/all/20201009104616.1314746-1-dw...@infradead.org/
> 
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e
> 
> 
> Adding David in case there's some kind of specification somewhere I'm
> not aware of.

Indeed there is no formal specification that I am aware of, although
it's vaguely possible that Microsoft wrote something up when they added
it to Hyper-V.

https://lore.kernel.org/all/20201103011136.59108-1-de...@microsoft.com/

I had an internal doc which looks like I can clean it up a tiny bit
and then share at http://david.woodhou.se/15-bit-msi.pdf if that helps?

> > What I'm aware of are vague indications of
> > this mode's existence in some of Intel's chipset data sheets. Yet that
> > leaves open, for example, whether indeed bit 48 cannot be used here.
> 
> Bit 48 cannot be used because it's already used to signal an RTE is in
> remappable format. We still want to differentiate an RTE entry in
> remappable format, as it should be possible to expose both the
> extended ID support and an emulated IOMMU.

Right. I chose not to use the low bit of the existing Extended
Destination ID because that's the one Intel used to indicate Remappable
Format. This means we can still expose an IOMMU to guests and easily
distinguish between Compatibility Format and Remappable Format MSIs
just as real hardware does.

> > > --- a/xen/arch/x86/hvm/vioapic.c
> > > +++ b/xen/arch/x86/hvm/vioapic.c
> > > @@ -412,7 +412,8 @@ static void ioapic_inj_irq(
> > >  
> > >  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int 
> > > pin)
> > >  {
> > > -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
> > > +uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
> > > +(vioapic->redirtbl[pin].fields.ext_dest_id << 8);
> > 
> > What if an existing guest has been writing non-zero in these bits? Can
> > you really use them here without any further indication by the guest?
> 
> Those bits where reserved previously, so no OS should have used them.
> There are hypervisors already in the field (QEMU/KVM and HyperV) using
> this mode.
> 
> We could add a per-domain option to disable extended ID mode if we are
> really worried about OSes having used those bits for some reason.

Note that I didn't even have to touch this part in qemu; the swizzling
of those 8 'Extended Destination ID' bits from the RTE into the
corresponding bits of the MSI message was already being done — without
which, IRQ remapping wouldn't have worked anyway.




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-26 Thread Jan Beulich
On 25.01.2022 16:13, Roger Pau Monné wrote:
> On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote:
>> On 20.01.2022 16:23, Roger Pau Monne wrote:
>>> Such field uses bits 55:48, but for the purposes the register will be
>>> used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
>>> in remappable format which is not supported by the vIO-APIC.
>>
>> Neither here nor in the cover letter you point at a formal specification
>> of this mode of operation.
> 
> I'm not aware of any formal specification of this mode, apart from the
> work done to introduce support in Linux and QEMU:
> 
> https://lore.kernel.org/all/20201009104616.1314746-1-dw...@infradead.org/
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e
> 
> Adding David in case there's some kind of specification somewhere I'm
> not aware of.
> 
>> What I'm aware of are vague indications of
>> this mode's existence in some of Intel's chipset data sheets. Yet that
>> leaves open, for example, whether indeed bit 48 cannot be used here.
> 
> Bit 48 cannot be used because it's already used to signal an RTE is in
> remappable format. We still want to differentiate an RTE entry in
> remappable format, as it should be possible to expose both the
> extended ID support and an emulated IOMMU.

I think I did say so on irc already: There's not really a problem like
this. For one I wouldn't expect an OS to use this extended ID at the
same time as having an IOMMU to deal with the width restriction. And
then, even if they wanted to use both at the same time, they'd simply
need to care about the specific meaning of this bit themselves: When
the bit is set, it would be unavoidable to have it (perhaps identity-)
remapped by the IOMMU.

>>> --- a/xen/arch/x86/hvm/vioapic.c
>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>> @@ -412,7 +412,8 @@ static void ioapic_inj_irq(
>>>  
>>>  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>>>  {
>>> -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
>>> +uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
>>> +(vioapic->redirtbl[pin].fields.ext_dest_id << 8);
>>
>> What if an existing guest has been writing non-zero in these bits? Can
>> you really use them here without any further indication by the guest?
> 
> Those bits where reserved previously, so no OS should have used them.
> There are hypervisors already in the field (QEMU/KVM and HyperV) using
> this mode.
> 
> We could add a per-domain option to disable extended ID mode if we are
> really worried about OSes having used those bits for some reason.

Generally I think previously ignored bits need to be handled with care.
If there was a specification, what is being said there might serve as
a guideline for us. Even if there was just a proper description of the
EDID field found in recent Intel chipset spec, this might already help
determining whether we want/need an enable (or disable). But there's
not even a bit announcing the functionality in, say, the version
register.

Jan




Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-25 Thread Roger Pau Monné
On Mon, Jan 24, 2022 at 02:20:47PM +0100, Jan Beulich wrote:
> On 20.01.2022 16:23, Roger Pau Monne wrote:
> > Such field uses bits 55:48, but for the purposes the register will be
> > used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
> > in remappable format which is not supported by the vIO-APIC.
> 
> Neither here nor in the cover letter you point at a formal specification
> of this mode of operation.

I'm not aware of any formal specification of this mode, apart from the
work done to introduce support in Linux and QEMU:

https://lore.kernel.org/all/20201009104616.1314746-1-dw...@infradead.org/
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=c1bb5418e

Adding David in case there's some kind of specification somewhere I'm
not aware of.

> What I'm aware of are vague indications of
> this mode's existence in some of Intel's chipset data sheets. Yet that
> leaves open, for example, whether indeed bit 48 cannot be used here.

Bit 48 cannot be used because it's already used to signal an RTE is in
remappable format. We still want to differentiate an RTE entry in
remappable format, as it should be possible to expose both the
extended ID support and an emulated IOMMU.

> > --- a/xen/arch/x86/hvm/vioapic.c
> > +++ b/xen/arch/x86/hvm/vioapic.c
> > @@ -412,7 +412,8 @@ static void ioapic_inj_irq(
> >  
> >  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
> >  {
> > -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
> > +uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
> > +(vioapic->redirtbl[pin].fields.ext_dest_id << 8);
> 
> What if an existing guest has been writing non-zero in these bits? Can
> you really use them here without any further indication by the guest?

Those bits where reserved previously, so no OS should have used them.
There are hypervisors already in the field (QEMU/KVM and HyperV) using
this mode.

We could add a per-domain option to disable extended ID mode if we are
really worried about OSes having used those bits for some reason.

Thanks, Roger.



Re: [PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-24 Thread Jan Beulich
On 20.01.2022 16:23, Roger Pau Monne wrote:
> Such field uses bits 55:48, but for the purposes the register will be
> used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
> in remappable format which is not supported by the vIO-APIC.

Neither here nor in the cover letter you point at a formal specification
of this mode of operation. What I'm aware of are vague indications of
this mode's existence in some of Intel's chipset data sheets. Yet that
leaves open, for example, whether indeed bit 48 cannot be used here.

> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -412,7 +412,8 @@ static void ioapic_inj_irq(
>  
>  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>  {
> -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
> +uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
> +(vioapic->redirtbl[pin].fields.ext_dest_id << 8);

What if an existing guest has been writing non-zero in these bits? Can
you really use them here without any further indication by the guest?

Jan




[PATCH 1/3] xen/vioapic: add support for the extended destination ID field

2022-01-20 Thread Roger Pau Monne
Such field uses bits 55:48, but for the purposes the register will be
used use bits 55:49 instead. Bit 48 is used to signal an RTE entry is
in remappable format which is not supported by the vIO-APIC.

Use the extended destination ID to store the high bits from the
destination ID, thus expanding the size of the destination ID field to
15 bits, allowing an IO-APIC to target APIC IDs up to 32768.

Note this is already supported by QEMU/KVM and HyperV.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/hvm/vioapic.c | 3 ++-
 xen/include/public/arch-x86/hvm/save.h | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 553c0f76ef..1f2305c232 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -412,7 +412,8 @@ static void ioapic_inj_irq(
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
 {
-uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
+uint16_t dest = vioapic->redirtbl[pin].fields.dest_id |
+(vioapic->redirtbl[pin].fields.ext_dest_id << 8);
 uint8_t dest_mode = vioapic->redirtbl[pin].fields.dest_mode;
 uint8_t delivery_mode = vioapic->redirtbl[pin].fields.delivery_mode;
 uint8_t vector = vioapic->redirtbl[pin].fields.vector;
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index 773a380bc2..14a5d94588 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -376,7 +376,9 @@ union vioapic_redir_entry
 uint8_t trig_mode:1;
 uint8_t mask:1;
 uint8_t reserve:7;
-uint8_t reserved[4];
+uint8_t reserved[3];
+uint8_t :1;
+uint8_t ext_dest_id:7;
 uint8_t dest_id;
 } fields;
 };
-- 
2.34.1