On 11/29/18 4:56 AM, Roger Pau Monné wrote: > On Thu, Nov 29, 2018 at 12:43:25PM +0800, Chao Gao wrote: >> On Wed, Nov 28, 2018 at 04:22:09PM +0100, Roger Pau Monné wrote: >>> On Wed, Nov 28, 2018 at 01:34:16PM +0800, Chao Gao wrote: >>> >>>> @@ -311,13 +350,45 @@ int >>>> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) >>>> if ( ret <= 0 ) >>>> { >>>> printk("No valid or newer microcode found. Update abort!\n"); >>>> - return -EINVAL; >>>> + ret = -EINVAL; >>>> + goto put; >>>> } >>>> >>>> - info->error = 0; >>>> - info->cpu = cpumask_first(&cpu_online_map); >>>> + atomic_set(&info->cpu_in, 0); >>>> + atomic_set(&info->cpu_out, 0); >>>> + >>>> + /* Calculate the number of online CPU core */ >>>> + nr_cores = 0; >>>> + for_each_online_cpu(cpu) >>>> + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) >>>> + nr_cores++; >>>> + >>>> + printk("%d cores are to update its microcode\n", nr_cores); >>>> >>>> - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, >>>> info); >>>> + /* >>>> + * We intend to disable interrupt for long time, which may lead to >>>> + * watchdog timeout. >>>> + */ >>>> + watchdog_disable(); >>>> + /* >>>> + * Late loading dance. Why the heavy-handed stop_machine effort? >>>> + * >>>> + * - HT siblings must be idle and not execute other code while the >>>> other >>>> + * sibling is loading microcode in order to avoid any negative >>>> + * interactions cause by the loading. >>> Well, the HT siblings will be executing code, since they are in a >>> while loop waiting for the non-siblings cores to finish updating. >> Strictly speaking, you are right. The 'idle' I think means no other >> workload on the cpu except microcode loading (for a HT sibling which >> isn't chosen to do the update, means waiting for the completion of >> the other sibling). > Could you clarify the comment then? > > By workload you mean that no other microcode loading should be > attempted from a HT sibling? > > Is there a set of instructions or functionality that cannot be used by > HT siblings while performing a microcode load?
The sibling should really not execute anything. For example, when updating from microcode which introduced MSR0x48 to a newer microcode which also updates 0x48 behavior the MSR (apparently) momentarily disappears. We've seen this reliably happen, with crashes when the sibling tries to access the MSR while the other thread is loading the microcode. One other comment about this patch (which IIRC was raised by Andrew on an earlier version) is that it may be worth to stop timer calibration. I am pretty sure we've seen deadlocks, which is why we ended up disabling it during microcode updates. -boris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel