subject:"\[Xen\-devel\] x86\/vMSI\-X emulation issue"

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 24 March 2016 09:47
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel
> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue
> 
> >>> On 24.03.16 at 10:39, <paul.durr...@citrix.com> wrote:
> >>  -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: 24 March 2016 09:35
> >> To: Paul Durrant
> >> Cc: Andrew Cooper; xen-devel
> >> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue
> >>
> >> >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote:
> >> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf
> Of
> >> Jan
> >> >> Beulich
> >> >> Sent: 24 March 2016 07:52
> >> >> > 2) Do aforementioned chopping automatically on seeing
> >> >> > X86EMUL_UNHANDLEABLE, on the basis that the .check
> >> >> > handler had indicated that the full range was acceptable. That
> >> >> > would at once cover other similarly undesirable cases like the
> >> >> > vLAPIC code returning this error. However, any stdvga like
> >> >> > emulated device would clearly not want such to happen, and
> >> >> > would instead prefer the entire batch to get forwarded in one
> >> >> > go (stdvga itself sits on a different path). Otoh, with the
> >> >> > devices we have currently, this would seem to be the least
> >> >> > intrusive solution.
> >> >>
> >> >> Having thought about it more over night, I think this indeed is
> >> >> the most reasonable route, not just because it's least intrusive:
> >> >> For non-buffered internally handled I/O requests, no good can
> >> >> come from forwarding full batches to qemu, when the respective
> >> >> range checking function has indicated that this is an acceptable
> >> >> request. And in fact neither vHPET not vIO-APIC code generate
> >> >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
> >> >> just apparently so - I'll submit a patch to make this obvious once
> >> >> tested.
> >> >>
> >> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
> >> >> a manner similar to the vMSI-X code - for internal caching and
> >> >> then forwarding to qemu. Clearly that is also broken for
> >> >> REP OUTS, and hence a similar rep count reduction is going to
> >> >> be needed for the port I/O case.
> >> >
> >> > It suggests that such cache-and/or-forward models should probably sit
> >> > somewhere else in the flow, possibly being invoked from
> >> hvm_send_ioreq()
> >> > since there should indeed be a selected ioreq server for these cases.
> >>
> >> I don't really think so. As I have gone through and carried out
> >> what I had described above, I think I managed to address at
> >> least one more issue with not properly handled rep counts, and
> >> hence I think doing it that way is correct. I'll have to test the
> >> thing before I can send it out, for you to take a look.
> >>
> >
> > Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the
> > forwarding behaviour though as it's only legitimate to do it on the first
> > rep.
> 
> Well, that's explicitly one of the wrong assumptions that patch
> addresses: It is perfectly fine for an individual handler to return
> this on other than the first iteration. It's only the generic
> infrastructure which doesn't currently permit this (for no
> apparent reason).
> 

Well, I guess the reason was that something returning X86EMUL_UNHANDLEABLE on a 
rep used to be used in the case of a page fault to bail out of the rep cycle, 
page in the memory and have it be restarted. I got rid of that in favour of 
pre-slicing the reps and making sure the memory was paged in before attempting 
the I/O. Thus there needed to be some special way of indicating an I/O that 
needed to be forwarded to QEMU vs. a page fault somewhere in the middle.

> > I always had the feeling there had to be a nicer way of doing it.
> > Possibly just too intrusive a change at this point though.
> 
> I'm of course up for alternatives, if you're willing to work on such.

I'll have a look at your code but, if I have the time I may look to re-factor 
things once 4.7 is out the door.

  Paul

> Yet I think backporting would become even more of a problem when
> going such an alternative route.
> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Jan Beulich

>>> On 24.03.16 at 10:39, <paul.durr...@citrix.com> wrote:
>>  -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: 24 March 2016 09:35
>> To: Paul Durrant
>> Cc: Andrew Cooper; xen-devel
>> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue
>> 
>> >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote:
>> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
>> Jan
>> >> Beulich
>> >> Sent: 24 March 2016 07:52
>> >> > 2) Do aforementioned chopping automatically on seeing
>> >> > X86EMUL_UNHANDLEABLE, on the basis that the .check
>> >> > handler had indicated that the full range was acceptable. That
>> >> > would at once cover other similarly undesirable cases like the
>> >> > vLAPIC code returning this error. However, any stdvga like
>> >> > emulated device would clearly not want such to happen, and
>> >> > would instead prefer the entire batch to get forwarded in one
>> >> > go (stdvga itself sits on a different path). Otoh, with the
>> >> > devices we have currently, this would seem to be the least
>> >> > intrusive solution.
>> >>
>> >> Having thought about it more over night, I think this indeed is
>> >> the most reasonable route, not just because it's least intrusive:
>> >> For non-buffered internally handled I/O requests, no good can
>> >> come from forwarding full batches to qemu, when the respective
>> >> range checking function has indicated that this is an acceptable
>> >> request. And in fact neither vHPET not vIO-APIC code generate
>> >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
>> >> just apparently so - I'll submit a patch to make this obvious once
>> >> tested.
>> >>
>> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
>> >> a manner similar to the vMSI-X code - for internal caching and
>> >> then forwarding to qemu. Clearly that is also broken for
>> >> REP OUTS, and hence a similar rep count reduction is going to
>> >> be needed for the port I/O case.
>> >
>> > It suggests that such cache-and/or-forward models should probably sit
>> > somewhere else in the flow, possibly being invoked from
>> hvm_send_ioreq()
>> > since there should indeed be a selected ioreq server for these cases.
>> 
>> I don't really think so. As I have gone through and carried out
>> what I had described above, I think I managed to address at
>> least one more issue with not properly handled rep counts, and
>> hence I think doing it that way is correct. I'll have to test the
>> thing before I can send it out, for you to take a look.
>> 
> 
> Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the 
> forwarding behaviour though as it's only legitimate to do it on the first 
> rep.

Well, that's explicitly one of the wrong assumptions that patch
addresses: It is perfectly fine for an individual handler to return
this on other than the first iteration. It's only the generic
infrastructure which doesn't currently permit this (for no
apparent reason).

> I always had the feeling there had to be a nicer way of doing it. 
> Possibly just too intrusive a change at this point though.

I'm of course up for alternatives, if you're willing to work on such.
Yet I think backporting would become even more of a problem when
going such an alternative route.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 24 March 2016 09:35
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel
> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue
> 
> >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote:
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Jan
> >> Beulich
> >> Sent: 24 March 2016 07:52
> >> > 2) Do aforementioned chopping automatically on seeing
> >> > X86EMUL_UNHANDLEABLE, on the basis that the .check
> >> > handler had indicated that the full range was acceptable. That
> >> > would at once cover other similarly undesirable cases like the
> >> > vLAPIC code returning this error. However, any stdvga like
> >> > emulated device would clearly not want such to happen, and
> >> > would instead prefer the entire batch to get forwarded in one
> >> > go (stdvga itself sits on a different path). Otoh, with the
> >> > devices we have currently, this would seem to be the least
> >> > intrusive solution.
> >>
> >> Having thought about it more over night, I think this indeed is
> >> the most reasonable route, not just because it's least intrusive:
> >> For non-buffered internally handled I/O requests, no good can
> >> come from forwarding full batches to qemu, when the respective
> >> range checking function has indicated that this is an acceptable
> >> request. And in fact neither vHPET not vIO-APIC code generate
> >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
> >> just apparently so - I'll submit a patch to make this obvious once
> >> tested.
> >>
> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
> >> a manner similar to the vMSI-X code - for internal caching and
> >> then forwarding to qemu. Clearly that is also broken for
> >> REP OUTS, and hence a similar rep count reduction is going to
> >> be needed for the port I/O case.
> >
> > It suggests that such cache-and/or-forward models should probably sit
> > somewhere else in the flow, possibly being invoked from
> hvm_send_ioreq()
> > since there should indeed be a selected ioreq server for these cases.
> 
> I don't really think so. As I have gone through and carried out
> what I had described above, I think I managed to address at
> least one more issue with not properly handled rep counts, and
> hence I think doing it that way is correct. I'll have to test the
> thing before I can send it out, for you to take a look.
> 

Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the 
forwarding behaviour though as it's only legitimate to do it on the first rep. 
I always had the feeling there had to be a nicer way of doing it. Possibly just 
too intrusive a change at this point though.

  Paul

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Jan Beulich

>>> On 24.03.16 at 10:09,  wrote:
>> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan
>> Beulich
>> Sent: 24 March 2016 07:52
>> > 2) Do aforementioned chopping automatically on seeing
>> > X86EMUL_UNHANDLEABLE, on the basis that the .check
>> > handler had indicated that the full range was acceptable. That
>> > would at once cover other similarly undesirable cases like the
>> > vLAPIC code returning this error. However, any stdvga like
>> > emulated device would clearly not want such to happen, and
>> > would instead prefer the entire batch to get forwarded in one
>> > go (stdvga itself sits on a different path). Otoh, with the
>> > devices we have currently, this would seem to be the least
>> > intrusive solution.
>> 
>> Having thought about it more over night, I think this indeed is
>> the most reasonable route, not just because it's least intrusive:
>> For non-buffered internally handled I/O requests, no good can
>> come from forwarding full batches to qemu, when the respective
>> range checking function has indicated that this is an acceptable
>> request. And in fact neither vHPET not vIO-APIC code generate
>> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
>> just apparently so - I'll submit a patch to make this obvious once
>> tested.
>> 
>> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
>> a manner similar to the vMSI-X code - for internal caching and
>> then forwarding to qemu. Clearly that is also broken for
>> REP OUTS, and hence a similar rep count reduction is going to
>> be needed for the port I/O case.
> 
> It suggests that such cache-and/or-forward models should probably sit 
> somewhere else in the flow, possibly being invoked from hvm_send_ioreq() 
> since there should indeed be a selected ioreq server for these cases.

I don't really think so. As I have gone through and carried out
what I had described above, I think I managed to address at
least one more issue with not properly handled rep counts, and
hence I think doing it that way is correct. I'll have to test the
thing before I can send it out, for you to take a look.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan
> Beulich
> Sent: 24 March 2016 07:52
> To: xen-devel
> Cc: Andrew Cooper
> Subject: Re: [Xen-devel] x86/vMSI-X emulation issue
> 
> >>> On 23.03.16 at 18:05, <jbeul...@suse.com> wrote:
> > All,
> >
> > so I've just learned that Windows (at least some versions and some
> > of their code paths) use REP MOVSD to read/write the MSI-X table.
> > The way at least msixtbl_write() works is not compatible with this
> > (msixtbl_read() also seems affected, albeit to a lesser degree), and
> > apparently it just worked by accident until the XSA-120 and 128-131
> > and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI:
> > track host and guest masking separately"), as without the call to
> > guest_mask_msi_irq() interrupts won't ever get unmasked.
> >
> > The problem with emulating REP MOVSD is that msixtbl_write()
> > intentionally returns X86EMUL_UNHANDLEABLE on all writes to
> > words 0, 1, and 2. When in the process of emulating multiple
> > writes, we therefore hand the entire batch of 3 or 4 writes to qemu,
> > and the hypervisor doesn't get to see any other than the initial
> > iteration.
> >
> > Now I see a couple of possible solutions, but none of them look
> > really neat, hence I'm seeking a second opinion (including, of
> > course, further alternative ideas):
> >
> > 1) Introduce another X86EMUL_* like status that's not really to be
> > used by the emulator itself, but only by the two vMSI-X functions
> > to indicate to their caller that prior to forwarding the request it
> > should be chopped to a single repetition.
> >
> > 2) Do aforementioned chopping automatically on seeing
> > X86EMUL_UNHANDLEABLE, on the basis that the .check
> > handler had indicated that the full range was acceptable. That
> > would at once cover other similarly undesirable cases like the
> > vLAPIC code returning this error. However, any stdvga like
> > emulated device would clearly not want such to happen, and
> > would instead prefer the entire batch to get forwarded in one
> > go (stdvga itself sits on a different path). Otoh, with the
> > devices we have currently, this would seem to be the least
> > intrusive solution.
> 
> Having thought about it more over night, I think this indeed is
> the most reasonable route, not just because it's least intrusive:
> For non-buffered internally handled I/O requests, no good can
> come from forwarding full batches to qemu, when the respective
> range checking function has indicated that this is an acceptable
> request. And in fact neither vHPET not vIO-APIC code generate
> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
> just apparently so - I'll submit a patch to make this obvious once
> tested.
> 
> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
> a manner similar to the vMSI-X code - for internal caching and
> then forwarding to qemu. Clearly that is also broken for
> REP OUTS, and hence a similar rep count reduction is going to
> be needed for the port I/O case.
> 

It suggests that such cache-and/or-forward models should probably sit somewhere 
else in the flow, possibly being invoked from hvm_send_ioreq() since there 
should indeed be a selected ioreq server for these cases.

  Paul

> vRTC code would misbehave too, albeit there it is quite hard to
> see what use REP INS or REP OUTS could be. Yet we can't
> exclude a guest using such, so we should make it behave
> correctly.
> 
> For handle_pmt_io(), otoh, forwarding the full batch would be
> okay, but since there shouldn't be any writes breaking up such
> batches wouldn't be a problem. Then again forwarding such
> invalid requests to qemu is kind of pointless - we could as well
> terminate them right in Xen, just like we terminate requests
> of other than 4 byte width -  again I'll submit a patch to make
> this obvious once tested.
> 
> Jan
> 
> > 3) Have emulation backends provide some kind of (static) flag
> > indicating which forwarding behavior they would like.
> >
> > 4) Expose the full ioreq to the emulation backends, so they can
> > fiddle with the request to their liking.
> >
> > Thanks, Jan
> >
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

2016-03-24 Thread Jan Beulich

>>> On 23.03.16 at 18:05,  wrote:
> All,
> 
> so I've just learned that Windows (at least some versions and some
> of their code paths) use REP MOVSD to read/write the MSI-X table.
> The way at least msixtbl_write() works is not compatible with this
> (msixtbl_read() also seems affected, albeit to a lesser degree), and
> apparently it just worked by accident until the XSA-120 and 128-131
> and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI:
> track host and guest masking separately"), as without the call to
> guest_mask_msi_irq() interrupts won't ever get unmasked.
> 
> The problem with emulating REP MOVSD is that msixtbl_write()
> intentionally returns X86EMUL_UNHANDLEABLE on all writes to
> words 0, 1, and 2. When in the process of emulating multiple
> writes, we therefore hand the entire batch of 3 or 4 writes to qemu,
> and the hypervisor doesn't get to see any other than the initial
> iteration.
> 
> Now I see a couple of possible solutions, but none of them look
> really neat, hence I'm seeking a second opinion (including, of
> course, further alternative ideas):
> 
> 1) Introduce another X86EMUL_* like status that's not really to be
> used by the emulator itself, but only by the two vMSI-X functions
> to indicate to their caller that prior to forwarding the request it
> should be chopped to a single repetition.
> 
> 2) Do aforementioned chopping automatically on seeing
> X86EMUL_UNHANDLEABLE, on the basis that the .check
> handler had indicated that the full range was acceptable. That
> would at once cover other similarly undesirable cases like the
> vLAPIC code returning this error. However, any stdvga like
> emulated device would clearly not want such to happen, and
> would instead prefer the entire batch to get forwarded in one
> go (stdvga itself sits on a different path). Otoh, with the
> devices we have currently, this would seem to be the least
> intrusive solution.

Having thought about it more over night, I think this indeed is
the most reasonable route, not just because it's least intrusive:
For non-buffered internally handled I/O requests, no good can
come from forwarding full batches to qemu, when the respective
range checking function has indicated that this is an acceptable
request. And in fact neither vHPET not vIO-APIC code generate
X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
just apparently so - I'll submit a patch to make this obvious once
tested.

Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
a manner similar to the vMSI-X code - for internal caching and
then forwarding to qemu. Clearly that is also broken for
REP OUTS, and hence a similar rep count reduction is going to
be needed for the port I/O case.

vRTC code would misbehave too, albeit there it is quite hard to
see what use REP INS or REP OUTS could be. Yet we can't
exclude a guest using such, so we should make it behave
correctly.

For handle_pmt_io(), otoh, forwarding the full batch would be
okay, but since there shouldn't be any writes breaking up such
batches wouldn't be a problem. Then again forwarding such
invalid requests to qemu is kind of pointless - we could as well
terminate them right in Xen, just like we terminate requests
of other than 4 byte width -  again I'll submit a patch to make
this obvious once tested.

Jan

> 3) Have emulation backends provide some kind of (static) flag
> indicating which forwarding behavior they would like.
> 
> 4) Expose the full ioreq to the emulation backends, so they can
> fiddle with the request to their liking.
> 
> Thanks, Jan
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] x86/vMSI-X emulation issue

2016-03-23 Thread Jan Beulich

All,

so I've just learned that Windows (at least some versions and some
of their code paths) use REP MOVSD to read/write the MSI-X table.
The way at least msixtbl_write() works is not compatible with this
(msixtbl_read() also seems affected, albeit to a lesser degree), and
apparently it just worked by accident until the XSA-120 and 128-131
and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI:
track host and guest masking separately"), as without the call to
guest_mask_msi_irq() interrupts won't ever get unmasked.

The problem with emulating REP MOVSD is that msixtbl_write()
intentionally returns X86EMUL_UNHANDLEABLE on all writes to
words 0, 1, and 2. When in the process of emulating multiple
writes, we therefore hand the entire batch of 3 or 4 writes to qemu,
and the hypervisor doesn't get to see any other than the initial
iteration.

Now I see a couple of possible solutions, but none of them look
really neat, hence I'm seeking a second opinion (including, of
course, further alternative ideas):

1) Introduce another X86EMUL_* like status that's not really to be
used by the emulator itself, but only by the two vMSI-X functions
to indicate to their caller that prior to forwarding the request it
should be chopped to a single repetition.

2) Do aforementioned chopping automatically on seeing
X86EMUL_UNHANDLEABLE, on the basis that the .check
handler had indicated that the full range was acceptable. That
would at once cover other similarly undesirable cases like the
vLAPIC code returning this error. However, any stdvga like
emulated device would clearly not want such to happen, and
would instead prefer the entire batch to get forwarded in one
go (stdvga itself sits on a different path). Otoh, with the
devices we have currently, this would seem to be the least
intrusive solution.

3) Have emulation backends provide some kind of (static) flag
indicating which forwarding behavior they would like.

4) Expose the full ioreq to the emulation backends, so they can
fiddle with the request to their liking.

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86/vMSI-X emulation issue

Re: [Xen-devel] x86/vMSI-X emulation issue

Re: [Xen-devel] x86/vMSI-X emulation issue

Re: [Xen-devel] x86/vMSI-X emulation issue

Re: [Xen-devel] x86/vMSI-X emulation issue

Re: [Xen-devel] x86/vMSI-X emulation issue

[Xen-devel] x86/vMSI-X emulation issue

7 matches

Site Navigation

Mail list logo

Footer information