On 11/30/18 4:46 AM, Jan Beulich wrote:
>>>> On 29.11.18 at 23:43, <boris.ostrov...@oracle.com> wrote:
>> One other comment about this patch (which IIRC was raised by Andrew on
>> an earlier version) is that it may be worth to stop timer calibration. I
>> am pretty sure we've seen deadlocks, which is why we ended up disabling
>> it during microcode updates.
> I recall the claim, but I don't think I've ever seen proof. 

I can't provide proof at this point, only somewhat vague memory of
seeing calibration code in the stack dump.

> My point was
> ans still is that if there's a problem with ucode loading using the
> stop-machine logic here, then there's a problem with the stop-machine
> logic in general, which would make other uses, perhaps most notably
> rcu_barrier(), unsafe too. 

Possibly.

rcu_barrier() appears to be only used in cpu hotplug code and power
management, and I don't know whether either has been tested under stress.

In our case it would take multiple microcode updates on relatively large
(~100 cpus) systems before we'd hit the deadlock.


> Otoh from your reply it's not clear whether
> the observed issue wasn't with our present way of loading ucode,
> but then it would put under question the general correctness of
> continue_hypercall_on_cpu(), which again we use for more than just
> ucode loading.

It was with a variation of this new code, not with what's currently in
the tree.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to