Re: [Xen-devel] x86/vMSI-X emulation issue
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: 24 March 2016 09:47 > To: Paul Durrant > Cc: Andrew Cooper; xen-devel > Subject: RE: [Xen-devel] x86/vMSI-X emulation issue > > >>> On 24.03.16 at 10:39, <paul.durr...@citrix.com> wrote: > >> -Original Message- > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: 24 March 2016 09:35 > >> To: Paul Durrant > >> Cc: Andrew Cooper; xen-devel > >> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue > >> > >> >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote: > >> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf > Of > >> Jan > >> >> Beulich > >> >> Sent: 24 March 2016 07:52 > >> >> > 2) Do aforementioned chopping automatically on seeing > >> >> > X86EMUL_UNHANDLEABLE, on the basis that the .check > >> >> > handler had indicated that the full range was acceptable. That > >> >> > would at once cover other similarly undesirable cases like the > >> >> > vLAPIC code returning this error. However, any stdvga like > >> >> > emulated device would clearly not want such to happen, and > >> >> > would instead prefer the entire batch to get forwarded in one > >> >> > go (stdvga itself sits on a different path). Otoh, with the > >> >> > devices we have currently, this would seem to be the least > >> >> > intrusive solution. > >> >> > >> >> Having thought about it more over night, I think this indeed is > >> >> the most reasonable route, not just because it's least intrusive: > >> >> For non-buffered internally handled I/O requests, no good can > >> >> come from forwarding full batches to qemu, when the respective > >> >> range checking function has indicated that this is an acceptable > >> >> request. And in fact neither vHPET not vIO-APIC code generate > >> >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also > >> >> just apparently so - I'll submit a patch to make this obvious once > >> >> tested. > >> >> > >> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in > >> >> a manner similar to the vMSI-X code - for internal caching and > >> >> then forwarding to qemu. Clearly that is also broken for > >> >> REP OUTS, and hence a similar rep count reduction is going to > >> >> be needed for the port I/O case. > >> > > >> > It suggests that such cache-and/or-forward models should probably sit > >> > somewhere else in the flow, possibly being invoked from > >> hvm_send_ioreq() > >> > since there should indeed be a selected ioreq server for these cases. > >> > >> I don't really think so. As I have gone through and carried out > >> what I had described above, I think I managed to address at > >> least one more issue with not properly handled rep counts, and > >> hence I think doing it that way is correct. I'll have to test the > >> thing before I can send it out, for you to take a look. > >> > > > > Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the > > forwarding behaviour though as it's only legitimate to do it on the first > > rep. > > Well, that's explicitly one of the wrong assumptions that patch > addresses: It is perfectly fine for an individual handler to return > this on other than the first iteration. It's only the generic > infrastructure which doesn't currently permit this (for no > apparent reason). > Well, I guess the reason was that something returning X86EMUL_UNHANDLEABLE on a rep used to be used in the case of a page fault to bail out of the rep cycle, page in the memory and have it be restarted. I got rid of that in favour of pre-slicing the reps and making sure the memory was paged in before attempting the I/O. Thus there needed to be some special way of indicating an I/O that needed to be forwarded to QEMU vs. a page fault somewhere in the middle. > > I always had the feeling there had to be a nicer way of doing it. > > Possibly just too intrusive a change at this point though. > > I'm of course up for alternatives, if you're willing to work on such. I'll have a look at your code but, if I have the time I may look to re-factor things once 4.7 is out the door. Paul > Yet I think backporting would become even more of a problem when > going such an alternative route. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86/vMSI-X emulation issue
>>> On 24.03.16 at 10:39, <paul.durr...@citrix.com> wrote: >> -Original Message- >> From: Jan Beulich [mailto:jbeul...@suse.com] >> Sent: 24 March 2016 09:35 >> To: Paul Durrant >> Cc: Andrew Cooper; xen-devel >> Subject: RE: [Xen-devel] x86/vMSI-X emulation issue >> >> >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote: >> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of >> Jan >> >> Beulich >> >> Sent: 24 March 2016 07:52 >> >> > 2) Do aforementioned chopping automatically on seeing >> >> > X86EMUL_UNHANDLEABLE, on the basis that the .check >> >> > handler had indicated that the full range was acceptable. That >> >> > would at once cover other similarly undesirable cases like the >> >> > vLAPIC code returning this error. However, any stdvga like >> >> > emulated device would clearly not want such to happen, and >> >> > would instead prefer the entire batch to get forwarded in one >> >> > go (stdvga itself sits on a different path). Otoh, with the >> >> > devices we have currently, this would seem to be the least >> >> > intrusive solution. >> >> >> >> Having thought about it more over night, I think this indeed is >> >> the most reasonable route, not just because it's least intrusive: >> >> For non-buffered internally handled I/O requests, no good can >> >> come from forwarding full batches to qemu, when the respective >> >> range checking function has indicated that this is an acceptable >> >> request. And in fact neither vHPET not vIO-APIC code generate >> >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also >> >> just apparently so - I'll submit a patch to make this obvious once >> >> tested. >> >> >> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in >> >> a manner similar to the vMSI-X code - for internal caching and >> >> then forwarding to qemu. Clearly that is also broken for >> >> REP OUTS, and hence a similar rep count reduction is going to >> >> be needed for the port I/O case. >> > >> > It suggests that such cache-and/or-forward models should probably sit >> > somewhere else in the flow, possibly being invoked from >> hvm_send_ioreq() >> > since there should indeed be a selected ioreq server for these cases. >> >> I don't really think so. As I have gone through and carried out >> what I had described above, I think I managed to address at >> least one more issue with not properly handled rep counts, and >> hence I think doing it that way is correct. I'll have to test the >> thing before I can send it out, for you to take a look. >> > > Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the > forwarding behaviour though as it's only legitimate to do it on the first > rep. Well, that's explicitly one of the wrong assumptions that patch addresses: It is perfectly fine for an individual handler to return this on other than the first iteration. It's only the generic infrastructure which doesn't currently permit this (for no apparent reason). > I always had the feeling there had to be a nicer way of doing it. > Possibly just too intrusive a change at this point though. I'm of course up for alternatives, if you're willing to work on such. Yet I think backporting would become even more of a problem when going such an alternative route. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86/vMSI-X emulation issue
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: 24 March 2016 09:35 > To: Paul Durrant > Cc: Andrew Cooper; xen-devel > Subject: RE: [Xen-devel] x86/vMSI-X emulation issue > > >>> On 24.03.16 at 10:09, <paul.durr...@citrix.com> wrote: > >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of > Jan > >> Beulich > >> Sent: 24 March 2016 07:52 > >> > 2) Do aforementioned chopping automatically on seeing > >> > X86EMUL_UNHANDLEABLE, on the basis that the .check > >> > handler had indicated that the full range was acceptable. That > >> > would at once cover other similarly undesirable cases like the > >> > vLAPIC code returning this error. However, any stdvga like > >> > emulated device would clearly not want such to happen, and > >> > would instead prefer the entire batch to get forwarded in one > >> > go (stdvga itself sits on a different path). Otoh, with the > >> > devices we have currently, this would seem to be the least > >> > intrusive solution. > >> > >> Having thought about it more over night, I think this indeed is > >> the most reasonable route, not just because it's least intrusive: > >> For non-buffered internally handled I/O requests, no good can > >> come from forwarding full batches to qemu, when the respective > >> range checking function has indicated that this is an acceptable > >> request. And in fact neither vHPET not vIO-APIC code generate > >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also > >> just apparently so - I'll submit a patch to make this obvious once > >> tested. > >> > >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in > >> a manner similar to the vMSI-X code - for internal caching and > >> then forwarding to qemu. Clearly that is also broken for > >> REP OUTS, and hence a similar rep count reduction is going to > >> be needed for the port I/O case. > > > > It suggests that such cache-and/or-forward models should probably sit > > somewhere else in the flow, possibly being invoked from > hvm_send_ioreq() > > since there should indeed be a selected ioreq server for these cases. > > I don't really think so. As I have gone through and carried out > what I had described above, I think I managed to address at > least one more issue with not properly handled rep counts, and > hence I think doing it that way is correct. I'll have to test the > thing before I can send it out, for you to take a look. > Ok. I never particularly liked using X86EMUL_UNHANDLEABLE to invoke the forwarding behaviour though as it's only legitimate to do it on the first rep. I always had the feeling there had to be a nicer way of doing it. Possibly just too intrusive a change at this point though. Paul > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86/vMSI-X emulation issue
>>> On 24.03.16 at 10:09,wrote: >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan >> Beulich >> Sent: 24 March 2016 07:52 >> > 2) Do aforementioned chopping automatically on seeing >> > X86EMUL_UNHANDLEABLE, on the basis that the .check >> > handler had indicated that the full range was acceptable. That >> > would at once cover other similarly undesirable cases like the >> > vLAPIC code returning this error. However, any stdvga like >> > emulated device would clearly not want such to happen, and >> > would instead prefer the entire batch to get forwarded in one >> > go (stdvga itself sits on a different path). Otoh, with the >> > devices we have currently, this would seem to be the least >> > intrusive solution. >> >> Having thought about it more over night, I think this indeed is >> the most reasonable route, not just because it's least intrusive: >> For non-buffered internally handled I/O requests, no good can >> come from forwarding full batches to qemu, when the respective >> range checking function has indicated that this is an acceptable >> request. And in fact neither vHPET not vIO-APIC code generate >> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also >> just apparently so - I'll submit a patch to make this obvious once >> tested. >> >> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in >> a manner similar to the vMSI-X code - for internal caching and >> then forwarding to qemu. Clearly that is also broken for >> REP OUTS, and hence a similar rep count reduction is going to >> be needed for the port I/O case. > > It suggests that such cache-and/or-forward models should probably sit > somewhere else in the flow, possibly being invoked from hvm_send_ioreq() > since there should indeed be a selected ioreq server for these cases. I don't really think so. As I have gone through and carried out what I had described above, I think I managed to address at least one more issue with not properly handled rep counts, and hence I think doing it that way is correct. I'll have to test the thing before I can send it out, for you to take a look. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86/vMSI-X emulation issue
> -Original Message- > From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan > Beulich > Sent: 24 March 2016 07:52 > To: xen-devel > Cc: Andrew Cooper > Subject: Re: [Xen-devel] x86/vMSI-X emulation issue > > >>> On 23.03.16 at 18:05, <jbeul...@suse.com> wrote: > > All, > > > > so I've just learned that Windows (at least some versions and some > > of their code paths) use REP MOVSD to read/write the MSI-X table. > > The way at least msixtbl_write() works is not compatible with this > > (msixtbl_read() also seems affected, albeit to a lesser degree), and > > apparently it just worked by accident until the XSA-120 and 128-131 > > and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI: > > track host and guest masking separately"), as without the call to > > guest_mask_msi_irq() interrupts won't ever get unmasked. > > > > The problem with emulating REP MOVSD is that msixtbl_write() > > intentionally returns X86EMUL_UNHANDLEABLE on all writes to > > words 0, 1, and 2. When in the process of emulating multiple > > writes, we therefore hand the entire batch of 3 or 4 writes to qemu, > > and the hypervisor doesn't get to see any other than the initial > > iteration. > > > > Now I see a couple of possible solutions, but none of them look > > really neat, hence I'm seeking a second opinion (including, of > > course, further alternative ideas): > > > > 1) Introduce another X86EMUL_* like status that's not really to be > > used by the emulator itself, but only by the two vMSI-X functions > > to indicate to their caller that prior to forwarding the request it > > should be chopped to a single repetition. > > > > 2) Do aforementioned chopping automatically on seeing > > X86EMUL_UNHANDLEABLE, on the basis that the .check > > handler had indicated that the full range was acceptable. That > > would at once cover other similarly undesirable cases like the > > vLAPIC code returning this error. However, any stdvga like > > emulated device would clearly not want such to happen, and > > would instead prefer the entire batch to get forwarded in one > > go (stdvga itself sits on a different path). Otoh, with the > > devices we have currently, this would seem to be the least > > intrusive solution. > > Having thought about it more over night, I think this indeed is > the most reasonable route, not just because it's least intrusive: > For non-buffered internally handled I/O requests, no good can > come from forwarding full batches to qemu, when the respective > range checking function has indicated that this is an acceptable > request. And in fact neither vHPET not vIO-APIC code generate > X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also > just apparently so - I'll submit a patch to make this obvious once > tested. > > Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in > a manner similar to the vMSI-X code - for internal caching and > then forwarding to qemu. Clearly that is also broken for > REP OUTS, and hence a similar rep count reduction is going to > be needed for the port I/O case. > It suggests that such cache-and/or-forward models should probably sit somewhere else in the flow, possibly being invoked from hvm_send_ioreq() since there should indeed be a selected ioreq server for these cases. Paul > vRTC code would misbehave too, albeit there it is quite hard to > see what use REP INS or REP OUTS could be. Yet we can't > exclude a guest using such, so we should make it behave > correctly. > > For handle_pmt_io(), otoh, forwarding the full batch would be > okay, but since there shouldn't be any writes breaking up such > batches wouldn't be a problem. Then again forwarding such > invalid requests to qemu is kind of pointless - we could as well > terminate them right in Xen, just like we terminate requests > of other than 4 byte width - again I'll submit a patch to make > this obvious once tested. > > Jan > > > 3) Have emulation backends provide some kind of (static) flag > > indicating which forwarding behavior they would like. > > > > 4) Expose the full ioreq to the emulation backends, so they can > > fiddle with the request to their liking. > > > > Thanks, Jan > > > > > > ___ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86/vMSI-X emulation issue
>>> On 23.03.16 at 18:05,wrote: > All, > > so I've just learned that Windows (at least some versions and some > of their code paths) use REP MOVSD to read/write the MSI-X table. > The way at least msixtbl_write() works is not compatible with this > (msixtbl_read() also seems affected, albeit to a lesser degree), and > apparently it just worked by accident until the XSA-120 and 128-131 > and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI: > track host and guest masking separately"), as without the call to > guest_mask_msi_irq() interrupts won't ever get unmasked. > > The problem with emulating REP MOVSD is that msixtbl_write() > intentionally returns X86EMUL_UNHANDLEABLE on all writes to > words 0, 1, and 2. When in the process of emulating multiple > writes, we therefore hand the entire batch of 3 or 4 writes to qemu, > and the hypervisor doesn't get to see any other than the initial > iteration. > > Now I see a couple of possible solutions, but none of them look > really neat, hence I'm seeking a second opinion (including, of > course, further alternative ideas): > > 1) Introduce another X86EMUL_* like status that's not really to be > used by the emulator itself, but only by the two vMSI-X functions > to indicate to their caller that prior to forwarding the request it > should be chopped to a single repetition. > > 2) Do aforementioned chopping automatically on seeing > X86EMUL_UNHANDLEABLE, on the basis that the .check > handler had indicated that the full range was acceptable. That > would at once cover other similarly undesirable cases like the > vLAPIC code returning this error. However, any stdvga like > emulated device would clearly not want such to happen, and > would instead prefer the entire batch to get forwarded in one > go (stdvga itself sits on a different path). Otoh, with the > devices we have currently, this would seem to be the least > intrusive solution. Having thought about it more over night, I think this indeed is the most reasonable route, not just because it's least intrusive: For non-buffered internally handled I/O requests, no good can come from forwarding full batches to qemu, when the respective range checking function has indicated that this is an acceptable request. And in fact neither vHPET not vIO-APIC code generate X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also just apparently so - I'll submit a patch to make this obvious once tested. Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in a manner similar to the vMSI-X code - for internal caching and then forwarding to qemu. Clearly that is also broken for REP OUTS, and hence a similar rep count reduction is going to be needed for the port I/O case. vRTC code would misbehave too, albeit there it is quite hard to see what use REP INS or REP OUTS could be. Yet we can't exclude a guest using such, so we should make it behave correctly. For handle_pmt_io(), otoh, forwarding the full batch would be okay, but since there shouldn't be any writes breaking up such batches wouldn't be a problem. Then again forwarding such invalid requests to qemu is kind of pointless - we could as well terminate them right in Xen, just like we terminate requests of other than 4 byte width - again I'll submit a patch to make this obvious once tested. Jan > 3) Have emulation backends provide some kind of (static) flag > indicating which forwarding behavior they would like. > > 4) Expose the full ioreq to the emulation backends, so they can > fiddle with the request to their liking. > > Thanks, Jan > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] x86/vMSI-X emulation issue
All, so I've just learned that Windows (at least some versions and some of their code paths) use REP MOVSD to read/write the MSI-X table. The way at least msixtbl_write() works is not compatible with this (msixtbl_read() also seems affected, albeit to a lesser degree), and apparently it just worked by accident until the XSA-120 and 128-131 and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI: track host and guest masking separately"), as without the call to guest_mask_msi_irq() interrupts won't ever get unmasked. The problem with emulating REP MOVSD is that msixtbl_write() intentionally returns X86EMUL_UNHANDLEABLE on all writes to words 0, 1, and 2. When in the process of emulating multiple writes, we therefore hand the entire batch of 3 or 4 writes to qemu, and the hypervisor doesn't get to see any other than the initial iteration. Now I see a couple of possible solutions, but none of them look really neat, hence I'm seeking a second opinion (including, of course, further alternative ideas): 1) Introduce another X86EMUL_* like status that's not really to be used by the emulator itself, but only by the two vMSI-X functions to indicate to their caller that prior to forwarding the request it should be chopped to a single repetition. 2) Do aforementioned chopping automatically on seeing X86EMUL_UNHANDLEABLE, on the basis that the .check handler had indicated that the full range was acceptable. That would at once cover other similarly undesirable cases like the vLAPIC code returning this error. However, any stdvga like emulated device would clearly not want such to happen, and would instead prefer the entire batch to get forwarded in one go (stdvga itself sits on a different path). Otoh, with the devices we have currently, this would seem to be the least intrusive solution. 3) Have emulation backends provide some kind of (static) flag indicating which forwarding behavior they would like. 4) Expose the full ioreq to the emulation backends, so they can fiddle with the request to their liking. Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel