On 11/30/18 4:46 AM, Jan Beulich wrote: >>>> On 29.11.18 at 23:43, <boris.ostrov...@oracle.com> wrote: >> One other comment about this patch (which IIRC was raised by Andrew on >> an earlier version) is that it may be worth to stop timer calibration. I >> am pretty sure we've seen deadlocks, which is why we ended up disabling >> it during microcode updates. > I recall the claim, but I don't think I've ever seen proof.
I can't provide proof at this point, only somewhat vague memory of seeing calibration code in the stack dump. > My point was > ans still is that if there's a problem with ucode loading using the > stop-machine logic here, then there's a problem with the stop-machine > logic in general, which would make other uses, perhaps most notably > rcu_barrier(), unsafe too. Possibly. rcu_barrier() appears to be only used in cpu hotplug code and power management, and I don't know whether either has been tested under stress. In our case it would take multiple microcode updates on relatively large (~100 cpus) systems before we'd hit the deadlock. > Otoh from your reply it's not clear whether > the observed issue wasn't with our present way of loading ucode, > but then it would put under question the general correctness of > continue_hypercall_on_cpu(), which again we use for more than just > ucode loading. It was with a variation of this new code, not with what's currently in the tree. -boris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel