Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/10/2016 02:13 PM, Thomas Gleixner wrote: On Sat, 10 Dec 2016, Thomas Gleixner wrote: On Fri, 9 Dec 2016, Boris Ostrovsky wrote: On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. Ignore my statement about earlier testing --- it was all on single-node machines. Something is broken with multi-node on Intel, but failure modes are different. Prior to this patch build_sched_domain() reports an error and pretty soon we crash in scheduler (don't remember off the top of my head). With patch applied I crash mush later, when one of the drivers does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, for example). Hmm. But the cpu_to_node() association is unrelated to the logical package management. Just came to my mind after hitting send. We had the whole persistent cpuid to nodeid association work merged in 4.9. So that might be related. Yes, that's exactly the reason. It uses _PXM to set nodeID and _PXM is exposed to dom0 (which is a privileged PV guest). Re: you previous message: after I "fix" the problem above, I see pr_info("Max logical packages: %u\n", __max_logical_packages); but no pr_warn(CPU %u Converting physical %u to logical package %u\n", ...) with or without topology_update_package_map() in arch/x86/xen/smp.c:cpu_bringup() -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Sat, 10 Dec 2016, Thomas Gleixner wrote: > On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: > > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > > > Boris, can you please verify if that makes the > > > > topology_update_package_map() call which you placed into the Xen cpu > > > > starting code obsolete ? > > > > > > Will do. I did test your patch but without removing > > > topology_update_package_map() call. It complained about package IDs > > > being wrong, but that's expected until I fix Xen part. > > > > Ignore my statement about earlier testing --- it was all on single-node > > machines. > > > > Something is broken with multi-node on Intel, but failure modes are > > different. > > Prior to this patch build_sched_domain() reports an error and pretty soon we > > crash in scheduler (don't remember off the top of my head). With patch > > applied > > I crash mush later, when one of the drivers does kmalloc_node(.., > > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen > > ("x86: Booted up 1 node, 32 CPUs" is reported, for example). > > Hmm. But the cpu_to_node() association is unrelated to the logical package > management. Just came to my mind after hitting send. We had the whole persistent cpuid to nodeid association work merged in 4.9. So that might be related. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > Boris, can you please verify if that makes the > > > topology_update_package_map() call which you placed into the Xen cpu > > > starting code obsolete ? > > > > Will do. I did test your patch but without removing > > topology_update_package_map() call. It complained about package IDs > > being wrong, but that's expected until I fix Xen part. > > Ignore my statement about earlier testing --- it was all on single-node > machines. > > Something is broken with multi-node on Intel, but failure modes are different. > Prior to this patch build_sched_domain() reports an error and pretty soon we > crash in scheduler (don't remember off the top of my head). With patch applied > I crash mush later, when one of the drivers does kmalloc_node(.., > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen > ("x86: Booted up 1 node, 32 CPUs" is reported, for example). Hmm. But the cpu_to_node() association is unrelated to the logical package management. > 2-node AMD box doesn't have these problems. > > I haven't upgraded the Intel machine for about a month but this all must have > happened in 4.9 timeframe. > > So I can't answer your question since we clearly have other problems on Xen. I > will be looking into this. Fair enough. What you could do though with this patch applied and the extra XEN call to topology_update_package_map() removed is to watchout for the following messages: pr_info("Max logical packages: %u\n", __max_logical_packages); and pr_warn(CPU %u Converting physical %u to logical package %u\n", ...) Ideally the latter wont show. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > On 12/09/2016 06:00 PM, Thomas Gleixner wrote: > > On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > > > Boris, can you please verify if that makes the > > > > topology_update_package_map() call which you placed into the Xen cpu > > > > starting code obsolete ? > > > > > > Will do. I did test your patch but without removing > > > topology_update_package_map() call. It complained about package IDs > > > being wrong, but that's expected until I fix Xen part. > > > > That should not longer be the case as I changed the approach to that > > management thing. > > > I didn't notice this email before I sent the earlier message. > > Is these anything else besides this patch that I should use? I applied it to > Linus tree and it didn't apply cleanly (there was some fuzz and such) so I > wonder whether I am missing something. No. I did it against tip, but there is nothing which it depends on. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/09/2016 06:00 PM, Thomas Gleixner wrote: On Fri, 9 Dec 2016, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. That should not longer be the case as I changed the approach to that management thing. I didn't notice this email before I sent the earlier message. Is these anything else besides this patch that I should use? I applied it to Linus tree and it didn't apply cleanly (there was some fuzz and such) so I wonder whether I am missing something. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. Ignore my statement about earlier testing --- it was all on single-node machines. Something is broken with multi-node on Intel, but failure modes are different. Prior to this patch build_sched_domain() reports an error and pretty soon we crash in scheduler (don't remember off the top of my head). With patch applied I crash mush later, when one of the drivers does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, for example). 2-node AMD box doesn't have these problems. I haven't upgraded the Intel machine for about a month but this all must have happened in 4.9 timeframe. So I can't answer your question since we clearly have other problems on Xen. I will be looking into this. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel