On Fri, Oct 11, 2019 at 12:01:36PM +0200, Jan Beulich wrote: > On 11.10.2019 11:28, Roger Pau Monné wrote: > > On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote: > >> On 10.10.2019 17:19, Roger Pau Monné wrote: > >>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: > >>>> On 10.10.2019 15:12, Roger Pau Monné wrote: > >>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > >>>>>> On 10.10.2019 14:13, Roger Pau Monné wrote: > >>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the > >>>>>>>> > >>>>>>>> Perhaps "unmasked" instead of "the"? > >>>>>>>> > >>>>>>>>> IO-APIC entries also need to be translated to the new format and > >>>>>>>>> added > >>>>>>>>> to the interrupt remapping table. > >>>>>>>>> > >>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on > >>>>>>>>> hardware that has unmasked IO-APIC pins. > >>>>>>>> > >>>>>>>> But in the end it only papers over whatever the spurious interrupts > >>>>>>>> result form, doesn't it? Which isn't to say this isn't an > >>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as > >>>>>>>> well, as would be pointing out that this case still won't work on > >>>>>>>> AMD IOMMUs. > >>>>>>> > >>>>>>> But the fix for the ExtINT AMD issue should be done in > >>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle > >>>>>>> ExtINT delivery mode, not to this part of the code. I will look > >>>>>>> into it, but I think it's kind of tangential to the issue here. > >>>>>> > >>>>>> I'm not talking of you working on fixing this right away. I'm merely > >>>>>> asking that you mention here (a) the ExtInt special case and (b) > >>>>>> that this special case will (continue to) not work in the AMD case. > >>>>>> > >>>>>>>>> --- a/xen/arch/x86/apic.c > >>>>>>>>> +++ b/xen/arch/x86/apic.c > >>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) > >>>>>>>>> iommu_enable_x2apic(); > >>>>>>>>> __enable_x2apic(); > >>>>>>>>> > >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, true); > >>>>>>>>> unmask_8259A(); > >>>>>>>>> > >>>>>>>>> out: > >>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > >>>>>>>>> printk("Switched to APIC driver %s\n", genapic.name); > >>>>>>>>> > >>>>>>>>> restore_out: > >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>>>> + /* > >>>>>>>>> + * NB: do not use raw mode when restoring entries if the iommu > >>>>>>>>> has been > >>>>>>>>> + * enabled during the process, because the entries need to be > >>>>>>>>> translated > >>>>>>>>> + * and added to the remapping table in that case. > >>>>>>>>> + */ > >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > >>>>>>>> > >>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets > >>>>>>>> enabled in the course of that as well. I.e. I'd expect you want > >>>>>>>> to pass "false" there, not "true". > >>>>>>> > >>>>>>> In the resume_x2apic case interrupt remapping should already be > >>>>>>> enabled or not, but that function is not going to enable interrupt > >>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries > >>>>>>> should already be using the interrupt remapping table and no > >>>>>>> translation is needed. > >>>>>> > >>>>>> Who / what would have enabled the IOMMU in the resume case? > >>>>> > >>>>> I don't think the question is who enables interrupt remapping in the > >>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic > >>>>> AFAICT), the point here is that on resume the entries in the IO-APIC > >>>>> will already match the state of interrupt remapping, so they shouldn't > >>>>> be translated. If interrupt remapping was off before suspend it will > >>>>> still be off after resume, and there won't be any translation needed. > >>>>> The same is true if interrupt remapping is on before suspend. > >>>> > >>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), > >>>> not prior to suspend. > >>> > >>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be > >>> able to find a statement about the contents of the IO-APIC registers > >>> (and more specifically the entries) when getting back from > >>> suspension. Are all entries cleared and masked? > >>> > >>> Are the values previous to suspension stored? > >> > >> See ioapic_suspend() / ioapic_resume(): Looks like there's some > >> redundancy here - I don't think it makes sense for the LAPIC > >> code to fiddle with all the RTEs if subsequently (I assume; > >> didn't check) they'll all be overwritten anyway. I would seem > >> more logical to me if they'd just all get masked for IOMMU > >> enabling, deferring to ioapic_resume() for everything else. > > > > Yes, it seems like resume_x2apic shouldn't need to play with the > > entries at all TBH, just enabling interrupt remapping should be enough > > given that the entries are already save and restored by > > ioapic_{suspend/resume}. > > > > Would you agree to leave that as-is in this patch, ie: always pass > > true to restore_IO_APIC_setup in resume_x2apic? > > Properly explained in the description, that's an option. Even better > would of course be to do away with the unnecessary save/restore in > a prereq patch, at which point the question disappears as to what to > pass to the function at that point.
OK, I can create a pre-patch, but I don't think I have any hardware that does suspend/resume since it's all server-grade. Someone would have to test it and assert it actually works correctly, hence my reluctance to touch any of the suspend/resume logic. > >>> But it's simply not possible to reach the call to > >>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt > >>> remapping disabled, regardless of the initial value of > >>> x2apic_enabled. > >>> > >>> All the paths that could lead to this scenario are short-circuited > >>> above with a panic. > >> > >> Hmm, true. Nevertheless it would feel better if the conditionals > >> were using what actually matters, rather than something derived. > > > > Would you be OK to using something like v1: > > > > https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html > > > > See the usage of iommu_enabled in x2apic_bsp_setup. I think using > > x2apic_enabled is fine given the logic in the function, and the fact > > that x2APIC mandates interrupt remapping, but I could live with that > > extra local variable. > > Conceptually this looks better to me. I'd prefer to avoid shadowing > the global "iommu_enabled" though, and set the local variable to > "true" only one we actually enable the IOMMU. Also strictly speaking > you care about the enabling of interrupt remapping, not that of the > IOMMU (which implicitly means DMA remapping). I wonder whether the > code couldn't easily be made cope with IOMMU already being enabled, > assuming that this would lead to {iommu,intremap}_enabled to already > be true on entry. I.e. you'd request a "raw" restore unless > intremap_enabled changed from false to true. I'm not sure it's that easy. A raw restore wouldn't fully work, since Xen would have replaced the interrupt remapping table with an empty one, so Xen would have to add the entries for the unmasked IO-APIC pins to the IRT and then likely also update the IO-APIC pin entry to match the newly created IRTE. I think adding a comment at the top of the function to note that this won't work properly if there are unmasked IO-APIC pins and interrupt remapping enabled should be enough for now, as the patch doesn't make this case worse than it's current state. > >>>> I realize that io_apic_write() would suitably avoid going > >>>> the remapping path, but I think it would be more clear if the > >>>> distinction was already made properly at the call site. > >>> > >>> I'm afraid I'm slightly loss, do you mean to replace the > >>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? > >> > >> No, because of ... > >> > >>> That would be the same as always passing raw == false AFAICT. > >> > >> ... this. I'm asking to pass in the argument for "raw" based > >> on what you want, without relying on io_apic_write()'s > >> behavior. That's more for code clarity than actual correctness, > >> since - as said - io_apic_write() would invoke __io_apic_write() > >> anyway. > > > > Sorry, but I'm afraid I still don't fully understand what you mean > > with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn > > uses __io_apic_write or io_apic_write. I could get rid of the usage of > > ioapic_write_entry and just call __io_apic_write or io_apic_write (or > > even directly call iommu_update_ire_from_apic), but I'm not sure > > what's the benefit of any of this, it's just open-coding logic that's > > done in the helpers. > > > > Would you rather mean to rename the parameter of restore_IO_APIC_setup > > from 'raw' to 'translate' or some such in order to clearly denote > > there's a transformation done before writing the entry? > > Whether it's "raw" or "translate" doesn't really matter to me. > What I'm after is for you to not pass raw=true when you really > mean raw=false (or the other way around), relying on the > intremap_enabled check down the call chain in io_apic_write(). I don't think that's the case with my proposed code (or at least not the intention), when raw=false is passed that's because the entries _must_ be translated, not because x2apic_bsp_setup is relying on the value of intremap_enabled. Let me prepare a new series with the proposed changes and we can continue from there. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel