[Xen-devel] Patch "x86/smpboot: Make logical package management more robust" has been added to the 4.9-stable tree

2017-01-06 Thread gregkh

This is a note to let you know that I've just added the patch titled

x86/smpboot: Make logical package management more robust

to the 4.9-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 x86-smpboot-make-logical-package-management-more-robust.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


From 9d85eb9119f4eeeb48e87adfcd71f752655700e9 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner 
Date: Mon, 12 Dec 2016 11:04:53 +0100
Subject: x86/smpboot: Make logical package management more robust

From: Thomas Gleixner 

commit 9d85eb9119f4eeeb48e87adfcd71f752655700e9 upstream.

The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov 
Tested-by: Boris Ostrovsky 
Signed-off-by: Thomas Gleixner 
Cc: Juergen Gross 
Cc: Peter Zijlstra 
Cc: M. Vefa Bicakci 
Cc: xen-devel 
Cc: Charles (Chas) Williams 
Cc: Borislav Petkov 
Cc: Alok Kataria 
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/kernel/apic/apic.c  |   15 
 arch/x86/kernel/cpu/common.c |   24 ++--
 arch/x86/kernel/smpboot.c|   51 ---
 arch/x86/xen/smp.c   |6 -
 4 files changed, 27 insertions(+), 69 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2159,21 +2159,6 @@ int __generic_processor_info(int apicid,
}
 
/*
-* This can happen on physical hotplug. The sanity check at boot time
-* is done from native_smp_prepare_cpus() after num_possible_cpus() is
-* established.
-*/
-   if (topology_update_package_map(apicid, cpu) < 0) {
-   int thiscpu = max + disabled_cpus;
-
-   pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
-  thiscpu, apicid);
-
-   disabled_cpus++;
-   return -ENOSPC;
-   }
-
-   /*
 * Validate version
 */
if (version == 0x0) {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cp
 }
 
 /*
- * The physical to logical package id mapping is initialized from the
- * acpi/mptables information. Make sure that CPUID actually agrees with
- * that.
+ * Validate that ACPI/mptables have the same information about the
+ * effective APIC id and update the package map.
  */
-static void sanitize_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-   unsigned int pkg, apicid, cpu = smp_processor_id();
+   unsigned int apicid, cpu = smp_processor_id();
 
apicid = apic->cpu_present_to_apicid(cpu);
-   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
 
-   if (apicid != c->initial_apicid) {
-   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+   if (apicid != c->apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. 

Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Boris Ostrovsky



On 12/10/2016 02:13 PM, Thomas Gleixner wrote:

On Sat, 10 Dec 2016, Thomas Gleixner wrote:

On Fri, 9 Dec 2016, Boris Ostrovsky wrote:

On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


Ignore my statement about earlier testing --- it was all on single-node
machines.

Something is broken with multi-node on Intel, but failure modes are different.
Prior to this patch build_sched_domain() reports an error and pretty soon we
crash in scheduler (don't remember off the top of my head). With patch applied
I crash mush later, when one of the drivers does kmalloc_node(..,
cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
("x86: Booted up 1 node, 32 CPUs" is reported, for example).


Hmm. But the cpu_to_node() association is unrelated to the logical package
management.


Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.



Yes, that's exactly the reason.

It uses _PXM to set nodeID and _PXM is exposed to dom0 (which is a 
privileged PV guest).


Re: you previous message: after I "fix" the problem above,  I see
pr_info("Max logical packages: %u\n", __max_logical_packages);
but no
pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

with or without topology_update_package_map() in 
arch/x86/xen/smp.c:cpu_bringup()



-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Sat, 10 Dec 2016, Thomas Gleixner wrote:
> On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > Ignore my statement about earlier testing --- it was all on single-node
> > machines.
> > 
> > Something is broken with multi-node on Intel, but failure modes are 
> > different.
> > Prior to this patch build_sched_domain() reports an error and pretty soon we
> > crash in scheduler (don't remember off the top of my head). With patch 
> > applied
> > I crash mush later, when one of the drivers does kmalloc_node(..,
> > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> > ("x86: Booted up 1 node, 32 CPUs" is reported, for example).
> 
> Hmm. But the cpu_to_node() association is unrelated to the logical package
> management.

Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > 
> > > Boris, can you please verify if that makes the
> > > topology_update_package_map() call which you placed into the Xen cpu
> > > starting code obsolete ?
> > 
> > Will do. I did test your patch but without removing
> > topology_update_package_map() call. It complained about package IDs
> > being wrong, but that's expected until I fix Xen part.
> 
> Ignore my statement about earlier testing --- it was all on single-node
> machines.
> 
> Something is broken with multi-node on Intel, but failure modes are different.
> Prior to this patch build_sched_domain() reports an error and pretty soon we
> crash in scheduler (don't remember off the top of my head). With patch applied
> I crash mush later, when one of the drivers does kmalloc_node(..,
> cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> ("x86: Booted up 1 node, 32 CPUs" is reported, for example).

Hmm. But the cpu_to_node() association is unrelated to the logical package
management.

> 2-node AMD box doesn't have these problems.
> 
> I haven't upgraded the Intel machine for about a month but this all must have
> happened in 4.9 timeframe.
> 
> So I can't answer your question since we clearly have other problems on Xen. I
> will be looking into this.

Fair enough. What you could do though with this patch applied and the extra
XEN call to topology_update_package_map() removed is to watchout for the
following messages:

  pr_info("Max logical packages: %u\n", __max_logical_packages);

and

  pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

Ideally the latter wont show.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:00 PM, Thomas Gleixner wrote:
> > On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > That should not longer be the case as I changed the approach to that
> > management thing.
> 
> 
> I didn't notice this email before I sent the earlier message.
> 
> Is these anything else besides this patch that I should use? I applied it to
> Linus tree and it didn't apply cleanly (there was some fuzz and such) so I
> wonder whether I am missing something.

No. I did it against tip, but there is nothing which it depends on.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:00 PM, Thomas Gleixner wrote:

On Fri, 9 Dec 2016, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


That should not longer be the case as I changed the approach to that
management thing.



I didn't notice this email before I sent the earlier message.

Is these anything else besides this patch that I should use? I applied 
it to Linus tree and it didn't apply cleanly (there was some fuzz and 
such) so I wonder whether I am missing something.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


Ignore my statement about earlier testing --- it was all on single-node 
machines.


Something is broken with multi-node on Intel, but failure modes are 
different. Prior to this patch build_sched_domain() reports an error and 
pretty soon we crash in scheduler (don't remember off the top of my 
head). With patch applied I crash mush later, when one of the drivers 
does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, 
which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, 
for example).


2-node AMD box doesn't have these problems.

I haven't upgraded the Intel machine for about a month but this all must 
have happened in 4.9 timeframe.


So I can't answer your question since we clearly have other problems on 
Xen. I will be looking into this.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel