Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-19 Thread Chao Gao
On Mon, Apr 16, 2018 at 04:26:09AM -0600, Jan Beulich wrote:
 On 16.04.18 at 08:20,  wrote:
>> On Fri, Apr 13, 2018 at 09:49:17AM -0600, Jan Beulich wrote:
>> On 30.03.18 at 08:59,  wrote:
 +static int do_microcode_update(void *_info)
 +{
 +struct microcode_info *info = _info;
 +int error, ret = 0;
 +
 +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);
>>>
>>>Why this long a timeout here?
>> 
>> I just use the same timeout as the patch on linux kernel side. As we
>> know if the timeout is too small, updating microcode may be likely to
>> failed even if other CPUs disabled interrupt temporally.
>> 
>> If you object to such a long timeout (for Xen may need much smaller
>> time to rendezvous all CPUs compared to linux kernel because Xen doesn't
>> have device drivers which may malfunction), how about just use the
>> default timeout, 30ms, used by live patching? if it is also not
>> good enough, then we make it an option which comes from callers.
>
>Yes, 30ms is likely to be acceptable.
>
 +if ( error )
 +{
 +ret = -EBUSY;
 +return ret;
 +}
 +
 +error = microcode_update_cpu(info->buffer, info->buffer_size);
 +if ( error && !ret )
 +ret = error;
 +/*
 + * Increase the wait timeout to a safe value here since we're 
 serializing
 + * the microcode update and that could take a while on a large number 
 of
 + * CPUs. And that is fine as the *actual* timeout will be determined 
 by
 + * the last CPU finished updating and thus cut short
 + */
 +error = __wait_for_cpus(>cpu_out, USEC_PER_SEC * 
 num_online_cpus());
>>>
>>>And this one's even worse, in particular on huge systems. I'm afraid such a 
>>>long
>>>period of time in stop-machine context is going to confuse most of the 
>>>running
>>>domains (including Dom0). There's nothing inherently wrong with e.g. 
>>>processing
>>>the updates on distinct cores (and even more so on distinct sockets) in 
>>>parallel.
>> 
>> I cannot say for sure. But the original patch does want updating
>> microcode be performed one-by-one.
>
>And they don't restrict the "when" aspect in any way? I.e. all sorts of
>applications (and perhaps even KVM guests) may be running?

No any restriction on the timing I think. Ashok, could you confirm this?

>
>>>Therefore revising the locking in microcode_update_cpu() might be a necessary
>>>prereq step.
>> 
>> Do you mean changing it to a per-core or per-socket lock?
>
>Either changing to such, or introducing a second, more fine grained lock.

I will introduce a per-core lock. Then only one thread in a core will
try to update microcode as an optimization. For I am not sure whether it
is permitted to update distinct cores in parallel, I am inclined to keep
the original logic.

Thanks
Chao

>
>>>Or alternatively you may need to demand that no other running
>>>domains exist besides Dom0 (and hope the best for Dom0 itself).
>>>
>>>I also don't think there's any point invoking the operation on all HT 
>>>threads on a
>>>core, but I realize stop_machine_run() isn't flexible enough to allow such.
>> 
>> Only one thread in a core will do the actual update. Other threads only
>> check the microcode version and find the version is already
>> update-to-date, then exit the critical region. Considering the check may
>> don't need much time, I wonder whether it can significantly benefit the
>> overall time to invoking the operation on all HT threads on a core?  
>
>With better parallelism this would be less of an issue. Plus, as said - the
>infrastructure we have at present wouldn't allow anyway what I've
>described above, and hand-rolling another "flavor" of the stop-machine
>logic is quite certainly out of question. To answer your question though:
>Taking as the example a 4-threads-per-core system with sufficiently
>many cores, I'm afraid the overall time spent in handling the
>"uninteresting" threads may become measurable. But of course, if actual
>numbers said otherwise, I'd be fine.
>
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-16 Thread Jan Beulich
>>> On 16.04.18 at 08:20,  wrote:
> On Fri, Apr 13, 2018 at 09:49:17AM -0600, Jan Beulich wrote:
> On 30.03.18 at 08:59,  wrote:
>>> +static int do_microcode_update(void *_info)
>>> +{
>>> +struct microcode_info *info = _info;
>>> +int error, ret = 0;
>>> +
>>> +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);
>>
>>Why this long a timeout here?
> 
> I just use the same timeout as the patch on linux kernel side. As we
> know if the timeout is too small, updating microcode may be likely to
> failed even if other CPUs disabled interrupt temporally.
> 
> If you object to such a long timeout (for Xen may need much smaller
> time to rendezvous all CPUs compared to linux kernel because Xen doesn't
> have device drivers which may malfunction), how about just use the
> default timeout, 30ms, used by live patching? if it is also not
> good enough, then we make it an option which comes from callers.

Yes, 30ms is likely to be acceptable.

>>> +if ( error )
>>> +{
>>> +ret = -EBUSY;
>>> +return ret;
>>> +}
>>> +
>>> +error = microcode_update_cpu(info->buffer, info->buffer_size);
>>> +if ( error && !ret )
>>> +ret = error;
>>> +/*
>>> + * Increase the wait timeout to a safe value here since we're 
>>> serializing
>>> + * the microcode update and that could take a while on a large number 
>>> of
>>> + * CPUs. And that is fine as the *actual* timeout will be determined by
>>> + * the last CPU finished updating and thus cut short
>>> + */
>>> +error = __wait_for_cpus(>cpu_out, USEC_PER_SEC * 
>>> num_online_cpus());
>>
>>And this one's even worse, in particular on huge systems. I'm afraid such a 
>>long
>>period of time in stop-machine context is going to confuse most of the running
>>domains (including Dom0). There's nothing inherently wrong with e.g. 
>>processing
>>the updates on distinct cores (and even more so on distinct sockets) in 
>>parallel.
> 
> I cannot say for sure. But the original patch does want updating
> microcode be performed one-by-one.

And they don't restrict the "when" aspect in any way? I.e. all sorts of
applications (and perhaps even KVM guests) may be running?

>>Therefore revising the locking in microcode_update_cpu() might be a necessary
>>prereq step.
> 
> Do you mean changing it to a per-core or per-socket lock?

Either changing to such, or introducing a second, more fine grained lock.

>>Or alternatively you may need to demand that no other running
>>domains exist besides Dom0 (and hope the best for Dom0 itself).
>>
>>I also don't think there's any point invoking the operation on all HT threads 
>>on a
>>core, but I realize stop_machine_run() isn't flexible enough to allow such.
> 
> Only one thread in a core will do the actual update. Other threads only
> check the microcode version and find the version is already
> update-to-date, then exit the critical region. Considering the check may
> don't need much time, I wonder whether it can significantly benefit the
> overall time to invoking the operation on all HT threads on a core?  

With better parallelism this would be less of an issue. Plus, as said - the
infrastructure we have at present wouldn't allow anyway what I've
described above, and hand-rolling another "flavor" of the stop-machine
logic is quite certainly out of question. To answer your question though:
Taking as the example a 4-threads-per-core system with sufficiently
many cores, I'm afraid the overall time spent in handling the
"uninteresting" threads may become measurable. But of course, if actual
numbers said otherwise, I'd be fine.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-16 Thread Chao Gao
On Fri, Apr 13, 2018 at 09:49:17AM -0600, Jan Beulich wrote:
 On 30.03.18 at 08:59,  wrote:
>> @@ -281,24 +287,52 @@ static int microcode_update_cpu(const void *buf, 
>> size_t size)
>>  return err;
>>  }
>>  
>> -static long do_microcode_update(void *_info)
>> +static int __wait_for_cpus(atomic_t *cnt, int timeout)
>
>No new double-underscore prefixed functions please.
>
>>  {
>> -struct microcode_info *info = _info;
>> -int error;
>> +int cpus = num_online_cpus();
>
>unsigned int
>
>> -BUG_ON(info->cpu != smp_processor_id());
>> +atomic_inc(cnt);
>>  
>> -error = microcode_update_cpu(info->buffer, info->buffer_size);
>> -if ( error )
>> -info->error = error;
>> +while (atomic_read(cnt) != cpus)
>
>There are a number of style issues in the patch, mostly (like here) missing
>blanks.

Will fix all issues pointed out by comments above.

>
>> +{
>> +if ( timeout <= 0 )
>> +{
>> +printk("Timeout when waiting for CPUs calling in\n");
>> +return -1;
>> +}
>> +udelay(1);
>> +timeout--;
>> +}
>>  
>> -info->cpu = cpumask_next(info->cpu, _online_map);
>> -if ( info->cpu < nr_cpu_ids )
>> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
>> info);
>> +return 0;
>> +}
>>  
>> -error = info->error;
>> -xfree(info);
>> -return error;
>> +static int do_microcode_update(void *_info)
>> +{
>> +struct microcode_info *info = _info;
>> +int error, ret = 0;
>> +
>> +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);
>
>Why this long a timeout here?

I just use the same timeout as the patch on linux kernel side. As we
know if the timeout is too small, updating microcode may be likely to
failed even if other CPUs disabled interrupt temporally.

If you object to such a long timeout (for Xen may need much smaller
time to rendezvous all CPUs compared to linux kernel because Xen doesn't
have device drivers which may malfunction), how about just use the
default timeout, 30ms, used by live patching? if it is also not
good enough, then we make it an option which comes from callers.

>
>> +if ( error )
>> +{
>> +ret = -EBUSY;
>> +return ret;
>> +}
>> +
>> +error = microcode_update_cpu(info->buffer, info->buffer_size);
>> +if ( error && !ret )
>> +ret = error;
>> +/*
>> + * Increase the wait timeout to a safe value here since we're 
>> serializing
>> + * the microcode update and that could take a while on a large number of
>> + * CPUs. And that is fine as the *actual* timeout will be determined by
>> + * the last CPU finished updating and thus cut short
>> + */
>> +error = __wait_for_cpus(>cpu_out, USEC_PER_SEC * 
>> num_online_cpus());
>
>And this one's even worse, in particular on huge systems. I'm afraid such a 
>long
>period of time in stop-machine context is going to confuse most of the running
>domains (including Dom0). There's nothing inherently wrong with e.g. processing
>the updates on distinct cores (and even more so on distinct sockets) in 
>parallel.

I cannot say for sure. But the original patch does want updating
microcode be performed one-by-one.

>Therefore revising the locking in microcode_update_cpu() might be a necessary
>prereq step.

Do you mean changing it to a per-core or per-socket lock?


>Or alternatively you may need to demand that no other running
>domains exist besides Dom0 (and hope the best for Dom0 itself).
>
>I also don't think there's any point invoking the operation on all HT threads 
>on a
>core, but I realize stop_machine_run() isn't flexible enough to allow such.

Only one thread in a core will do the actual update. Other threads only
check the microcode version and find the version is already
update-to-date, then exit the critical region. Considering the check may
don't need much time, I wonder whether it can significantly benefit the
overall time to invoking the operation on all HT threads on a core?  

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-13 Thread Jan Beulich
>>> On 30.03.18 at 08:59,  wrote:
> @@ -281,24 +287,52 @@ static int microcode_update_cpu(const void *buf, size_t 
> size)
>  return err;
>  }
>  
> -static long do_microcode_update(void *_info)
> +static int __wait_for_cpus(atomic_t *cnt, int timeout)

No new double-underscore prefixed functions please.

>  {
> -struct microcode_info *info = _info;
> -int error;
> +int cpus = num_online_cpus();

unsigned int

> -BUG_ON(info->cpu != smp_processor_id());
> +atomic_inc(cnt);
>  
> -error = microcode_update_cpu(info->buffer, info->buffer_size);
> -if ( error )
> -info->error = error;
> +while (atomic_read(cnt) != cpus)

There are a number of style issues in the patch, mostly (like here) missing
blanks.

> +{
> +if ( timeout <= 0 )
> +{
> +printk("Timeout when waiting for CPUs calling in\n");
> +return -1;
> +}
> +udelay(1);
> +timeout--;
> +}
>  
> -info->cpu = cpumask_next(info->cpu, _online_map);
> -if ( info->cpu < nr_cpu_ids )
> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
> info);
> +return 0;
> +}
>  
> -error = info->error;
> -xfree(info);
> -return error;
> +static int do_microcode_update(void *_info)
> +{
> +struct microcode_info *info = _info;
> +int error, ret = 0;
> +
> +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);

Why this long a timeout here?

> +if ( error )
> +{
> +ret = -EBUSY;
> +return ret;
> +}
> +
> +error = microcode_update_cpu(info->buffer, info->buffer_size);
> +if ( error && !ret )
> +ret = error;
> +/*
> + * Increase the wait timeout to a safe value here since we're serializing
> + * the microcode update and that could take a while on a large number of
> + * CPUs. And that is fine as the *actual* timeout will be determined by
> + * the last CPU finished updating and thus cut short
> + */
> +error = __wait_for_cpus(>cpu_out, USEC_PER_SEC * 
> num_online_cpus());

And this one's even worse, in particular on huge systems. I'm afraid such a long
period of time in stop-machine context is going to confuse most of the running
domains (including Dom0). There's nothing inherently wrong with e.g. processing
the updates on distinct cores (and even more so on distinct sockets) in 
parallel.
Therefore revising the locking in microcode_update_cpu() might be a necessary
prereq step. Or alternatively you may need to demand that no other running
domains exist besides Dom0 (and hope the best for Dom0 itself).

I also don't think there's any point invoking the operation on all HT threads 
on a
core, but I realize stop_machine_run() isn't flexible enough to allow such.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-13 Thread Jan Beulich
>>> On 13.04.18 at 07:25,  wrote:
> On Thu, Apr 12, 2018 at 09:29:34AM -0700, Raj, Ashok wrote:
>>On Fri, Mar 30, 2018 at 02:59:00PM +0800, Chao Gao wrote:
>>> From: Gao Chao 
>>> 
>>> This patch is to backport microcode improvement patches from linux
>>> kernel. Below are the original patches description:
>>> 
>>> commit a5321aec6412b20b5ad15db2d6b916c05349dbff
>>> Author: Ashok Raj 
>>> Date:   Wed Feb 28 11:28:46 2018 +0100
>>> 
>>> x86/microcode: Synchronize late microcode loading
>>> 
>>> Original idea by Ashok, completely rewritten by Borislav.
>>> 
>>> Before you read any further: the early loading method is still the
>>> preferred one and you should always do that. The following patch is
>>> improving the late loading mechanism for long running jobs and cloud use
>>> cases.
>>> 
>>> Gather all cores and serialize the microcode update on them by doing it
>>> one-by-one to make the late update process as reliable as possible and
>>> avoid potential issues caused by the microcode update.
>>> 
>>> [ Borislav: Rewrite completely. ]
>>> 
>>> Co-developed-by: Borislav Petkov 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Borislav Petkov 
>>> Signed-off-by: Thomas Gleixner 
>>> Tested-by: Tom Lendacky 
>>> Tested-by: Ashok Raj 
>>
>>The tested by tags were good for linux upstream. Can you make sure
>>you add your name under tested-by for xen?
> 
> Tested-by doesn't mean they have tested this patch. I just put the
> original commits description here.

Tested-by being as meaningful in Xen as it is in Linux, retaining such tags 
(other
than authorship ones) is generally wrong, as it gives a wrong impression on
what testing the _Xen_ patch has seen.

> If Xen permits such tested-by from the sender and author, I will add one.

I think it is generally implied that the author has done some testing. In the
case here - with a ported over Linux commit by other than the original
author - I would view a T-b by the original author as meaningful though. It
should be made clear though that this is a ported Linux commit, which
generally we do by naming the Linux commit. See e.g. the history of
mwait-idle.c, where most of the commits are straight ports from Linux.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-12 Thread Chao Gao
On Thu, Apr 12, 2018 at 09:29:34AM -0700, Raj, Ashok wrote:
>On Fri, Mar 30, 2018 at 02:59:00PM +0800, Chao Gao wrote:
>> From: Gao Chao 
>> 
>> This patch is to backport microcode improvement patches from linux
>> kernel. Below are the original patches description:
>> 
>> commit a5321aec6412b20b5ad15db2d6b916c05349dbff
>> Author: Ashok Raj 
>> Date:   Wed Feb 28 11:28:46 2018 +0100
>> 
>>  x86/microcode: Synchronize late microcode loading
>> 
>>  Original idea by Ashok, completely rewritten by Borislav.
>> 
>>  Before you read any further: the early loading method is still the
>>  preferred one and you should always do that. The following patch is
>>  improving the late loading mechanism for long running jobs and cloud use
>>  cases.
>> 
>>  Gather all cores and serialize the microcode update on them by doing it
>>  one-by-one to make the late update process as reliable as possible and
>>  avoid potential issues caused by the microcode update.
>> 
>>  [ Borislav: Rewrite completely. ]
>> 
>>  Co-developed-by: Borislav Petkov 
>>  Signed-off-by: Ashok Raj 
>>  Signed-off-by: Borislav Petkov 
>>  Signed-off-by: Thomas Gleixner 
>>  Tested-by: Tom Lendacky 
>>  Tested-by: Ashok Raj 
>
>The tested by tags were good for linux upstream. Can you make sure
>you add your name under tested-by for xen?

Tested-by doesn't mean they have tested this patch. I just put the
original commits description here.

If Xen permits such tested-by from the sender and author, I will add one.

>
>>  Reviewed-by: Tom Lendacky 
>>  Cc: Arjan Van De Ven 
>>  Link: https://lkml.kernel.org/r/20180228102846.13447-8...@alien8.de
>> 
>> +static int do_microcode_update(void *_info)
>> +{
>> +struct microcode_info *info = _info;
>> +int error, ret = 0;
>> +
>> +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);
>> +if ( error )
>> +{
>> +ret = -EBUSY;
>> +return ret;
>> +}
>> +
>> +error = microcode_update_cpu(info->buffer, info->buffer_size);
>> +if ( error && !ret )
>> +ret = error;
>
>Isn't this redundant? can ret become non-zero in the check above?

Yes, it is redundant. I will also remove 'error' field from struct
microcode_info  because stop_machine_run already has the ability to return the
first error code. Hence, don't need to implement it again here (this is the
reason why the above fragment is weird).

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-12 Thread Raj, Ashok
On Fri, Mar 30, 2018 at 02:59:00PM +0800, Chao Gao wrote:
> From: Gao Chao 
> 
> This patch is to backport microcode improvement patches from linux
> kernel. Below are the original patches description:
> 
> commit a5321aec6412b20b5ad15db2d6b916c05349dbff
> Author: Ashok Raj 
> Date:   Wed Feb 28 11:28:46 2018 +0100
> 
>   x86/microcode: Synchronize late microcode loading
> 
>   Original idea by Ashok, completely rewritten by Borislav.
> 
>   Before you read any further: the early loading method is still the
>   preferred one and you should always do that. The following patch is
>   improving the late loading mechanism for long running jobs and cloud use
>   cases.
> 
>   Gather all cores and serialize the microcode update on them by doing it
>   one-by-one to make the late update process as reliable as possible and
>   avoid potential issues caused by the microcode update.
> 
>   [ Borislav: Rewrite completely. ]
> 
>   Co-developed-by: Borislav Petkov 
>   Signed-off-by: Ashok Raj 
>   Signed-off-by: Borislav Petkov 
>   Signed-off-by: Thomas Gleixner 
>   Tested-by: Tom Lendacky 
>   Tested-by: Ashok Raj 

The tested by tags were good for linux upstream. Can you make sure
you add your name under tested-by for xen?

>   Reviewed-by: Tom Lendacky 
>   Cc: Arjan Van De Ven 
>   Link: https://lkml.kernel.org/r/20180228102846.13447-8...@alien8.de
> 
> +static int do_microcode_update(void *_info)
> +{
> +struct microcode_info *info = _info;
> +int error, ret = 0;
> +
> +error = __wait_for_cpus(>cpu_in, USEC_PER_SEC);
> +if ( error )
> +{
> +ret = -EBUSY;
> +return ret;
> +}
> +
> +error = microcode_update_cpu(info->buffer, info->buffer_size);
> +if ( error && !ret )
> +ret = error;

Isn't this redundant? can ret become non-zero in the check above?

> +/*
> + * Increase the wait timeout to a safe value here since we're serializing
> + * the microcode update and that could take a while on a large number of
> + * CPUs. And that is fine as the *actual* timeout will be determined by
> + * the last CPU finished updating and thus cut short
> + */
> +error = __wait_for_cpus(>cpu_out, USEC_PER_SEC * 
> num_online_cpus());
> +if (error)
> +panic("Timeout when finishing updating microcode");
> +info->error = ret;
> +return ret;
>  }
>  



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-12 Thread Jan Beulich
>>> On 12.04.18 at 08:31,  wrote:
> Ping...
> 
> Can someone help to review these two patches?

Sure, they're on my list of things to look at. Too many other things
going on.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-04-12 Thread Chao Gao
Ping...

Can someone help to review these two patches?

On Fri, Mar 30, 2018 at 02:59:00PM +0800, Chao Gao wrote:
>From: Gao Chao 
>
>This patch is to backport microcode improvement patches from linux
>kernel. Below are the original patches description:
>
>commit a5321aec6412b20b5ad15db2d6b916c05349dbff
>Author: Ashok Raj 
>Date:   Wed Feb 28 11:28:46 2018 +0100
>
>   x86/microcode: Synchronize late microcode loading
>
>   Original idea by Ashok, completely rewritten by Borislav.
>
>   Before you read any further: the early loading method is still the
>   preferred one and you should always do that. The following patch is
>   improving the late loading mechanism for long running jobs and cloud use
>   cases.
>
>   Gather all cores and serialize the microcode update on them by doing it
>   one-by-one to make the late update process as reliable as possible and
>   avoid potential issues caused by the microcode update.
>
>   [ Borislav: Rewrite completely. ]
>
>   Co-developed-by: Borislav Petkov 
>   Signed-off-by: Ashok Raj 
>   Signed-off-by: Borislav Petkov 
>   Signed-off-by: Thomas Gleixner 
>   Tested-by: Tom Lendacky 
>   Tested-by: Ashok Raj 
>   Reviewed-by: Tom Lendacky 
>   Cc: Arjan Van De Ven 
>   Link: https://lkml.kernel.org/r/20180228102846.13447-8...@alien8.de
>
>commit bb8c13d61a629276a162c1d2b1a20a815cbcfbb7
>Author: Borislav Petkov 
>Date:   Wed Mar 14 19:36:15 2018 +0100
>
>   x86/microcode: Fix CPU synchronization routine
>
>   Emanuel reported an issue with a hang during microcode update because my
>   dumb idea to use one atomic synchronization variable for both rendezvous
>   - before and after update - was simply bollocks:
>
> microcode: microcode_reload_late: late_cpus: 4
> microcode: __reload_late: cpu 2 entered
> microcode: __reload_late: cpu 1 entered
> microcode: __reload_late: cpu 3 entered
> microcode: __reload_late: cpu 0 entered
> microcode: __reload_late: cpu 1 left
> microcode: Timeout while waiting for CPUs rendezvous, remaining: 1
>
>   CPU1 above would finish, leave and the others will still spin waiting 
> for
>   it to join.
>
>   So do two synchronization atomics instead, which makes the code a lot 
> more
>   straightforward.
>
>   Also, since the update is serialized and it also takes quite some time 
> per
>   microcode engine, increase the exit timeout by the number of CPUs on the
>   system.
>
>   That's ok because the moment all CPUs are done, that timeout will be cut
>   short.
>
>   Furthermore, panic when some of the CPUs timeout when returning from a
>   microcode update: we can't allow a system with not all cores updated.
>
>   Also, as an optimization, do not do the exit sync if microcode wasn't
>   updated.
>
>   Reported-by: Emanuel Czirai 
>   Signed-off-by: Borislav Petkov 
>   Signed-off-by: Thomas Gleixner 
>   Tested-by: Emanuel Czirai 
>   Tested-by: Ashok Raj 
>   Tested-by: Tom Lendacky 
>   Link: https://lkml.kernel.org/r/20180314183615.17629-2...@alien8.de
>
>This patch is also in accord with Andrew's suggestion,
>"Rendezvous all online cpus in an IPI to apply the patch, and keep the
>processors in until all have completed the patch.", in [1].
>
>[1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates
>
>Signed-off-by: Chao Gao 
>Cc: Kevin Tian 
>Cc: Jun Nakajima 
>Cc: Ashok Raj 
>Cc: Borislav Petkov 
>Cc: Thomas Gleixner 
>---
> xen/arch/x86/microcode.c | 89 +++-
> 1 file changed, 73 insertions(+), 16 deletions(-)
>
>diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>index 4163f50..94c1ca2 100644
>--- a/xen/arch/x86/microcode.c
>+++ b/xen/arch/x86/microcode.c
>@@ -22,6 +22,7 @@
>  */
> 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -30,15 +31,20 @@
> #include 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>+#include 
> 
>+#include 
> #include 
> #include 
> #include 
> #include 
> 
>+#define USEC_PER_SEC 100UL
>+
> static module_t __initdata ucode_mod;
> static signed int __initdata ucode_mod_idx;
> static bool_t __initdata ucode_mod_forced;
>@@ -187,7 +193,7 @@ static DEFINE_SPINLOCK(microcode_mutex);
> DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
> 
> struct 

[Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

2018-03-30 Thread Chao Gao
From: Gao Chao 

This patch is to backport microcode improvement patches from linux
kernel. Below are the original patches description:

commit a5321aec6412b20b5ad15db2d6b916c05349dbff
Author: Ashok Raj 
Date:   Wed Feb 28 11:28:46 2018 +0100

x86/microcode: Synchronize late microcode loading

Original idea by Ashok, completely rewritten by Borislav.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

[ Borislav: Rewrite completely. ]

Co-developed-by: Borislav Petkov 
Signed-off-by: Ashok Raj 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Tested-by: Tom Lendacky 
Tested-by: Ashok Raj 
Reviewed-by: Tom Lendacky 
Cc: Arjan Van De Ven 
Link: https://lkml.kernel.org/r/20180228102846.13447-8...@alien8.de

commit bb8c13d61a629276a162c1d2b1a20a815cbcfbb7
Author: Borislav Petkov 
Date:   Wed Mar 14 19:36:15 2018 +0100

x86/microcode: Fix CPU synchronization routine

Emanuel reported an issue with a hang during microcode update because my
dumb idea to use one atomic synchronization variable for both rendezvous
- before and after update - was simply bollocks:

  microcode: microcode_reload_late: late_cpus: 4
  microcode: __reload_late: cpu 2 entered
  microcode: __reload_late: cpu 1 entered
  microcode: __reload_late: cpu 3 entered
  microcode: __reload_late: cpu 0 entered
  microcode: __reload_late: cpu 1 left
  microcode: Timeout while waiting for CPUs rendezvous, remaining: 1

CPU1 above would finish, leave and the others will still spin waiting 
for
it to join.

So do two synchronization atomics instead, which makes the code a lot 
more
straightforward.

Also, since the update is serialized and it also takes quite some time 
per
microcode engine, increase the exit timeout by the number of CPUs on the
system.

That's ok because the moment all CPUs are done, that timeout will be cut
short.

Furthermore, panic when some of the CPUs timeout when returning from a
microcode update: we can't allow a system with not all cores updated.

Also, as an optimization, do not do the exit sync if microcode wasn't
updated.

Reported-by: Emanuel Czirai 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Tested-by: Emanuel Czirai 
Tested-by: Ashok Raj 
Tested-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20180314183615.17629-2...@alien8.de

This patch is also in accord with Andrew's suggestion,
"Rendezvous all online cpus in an IPI to apply the patch, and keep the
processors in until all have completed the patch.", in [1].

[1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates

Signed-off-by: Chao Gao 
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
---
 xen/arch/x86/microcode.c | 89 +++-
 1 file changed, 73 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 4163f50..94c1ca2 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -22,6 +22,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -30,15 +31,20 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+#define USEC_PER_SEC 100UL
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
@@ -187,7 +193,7 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
 
 struct microcode_info {
-unsigned int cpu;
+atomic_t cpu_in, cpu_out;
 uint32_t buffer_size;
 int error;
 char buffer[1];
@@ -281,24 +287,52 @@ static int microcode_update_cpu(const void *buf, size_t