Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Boris Ostrovsky



On 12/10/2016 02:13 PM, Thomas Gleixner wrote:

On Sat, 10 Dec 2016, Thomas Gleixner wrote:

On Fri, 9 Dec 2016, Boris Ostrovsky wrote:

On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


Ignore my statement about earlier testing --- it was all on single-node
machines.

Something is broken with multi-node on Intel, but failure modes are different.
Prior to this patch build_sched_domain() reports an error and pretty soon we
crash in scheduler (don't remember off the top of my head). With patch applied
I crash mush later, when one of the drivers does kmalloc_node(..,
cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
("x86: Booted up 1 node, 32 CPUs" is reported, for example).


Hmm. But the cpu_to_node() association is unrelated to the logical package
management.


Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.



Yes, that's exactly the reason.

It uses _PXM to set nodeID and _PXM is exposed to dom0 (which is a 
privileged PV guest).


Re: you previous message: after I "fix" the problem above,  I see
pr_info("Max logical packages: %u\n", __max_logical_packages);
but no
pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

with or without topology_update_package_map() in 
arch/x86/xen/smp.c:cpu_bringup()



-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Sat, 10 Dec 2016, Thomas Gleixner wrote:
> On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > Ignore my statement about earlier testing --- it was all on single-node
> > machines.
> > 
> > Something is broken with multi-node on Intel, but failure modes are 
> > different.
> > Prior to this patch build_sched_domain() reports an error and pretty soon we
> > crash in scheduler (don't remember off the top of my head). With patch 
> > applied
> > I crash mush later, when one of the drivers does kmalloc_node(..,
> > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> > ("x86: Booted up 1 node, 32 CPUs" is reported, for example).
> 
> Hmm. But the cpu_to_node() association is unrelated to the logical package
> management.

Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > 
> > > Boris, can you please verify if that makes the
> > > topology_update_package_map() call which you placed into the Xen cpu
> > > starting code obsolete ?
> > 
> > Will do. I did test your patch but without removing
> > topology_update_package_map() call. It complained about package IDs
> > being wrong, but that's expected until I fix Xen part.
> 
> Ignore my statement about earlier testing --- it was all on single-node
> machines.
> 
> Something is broken with multi-node on Intel, but failure modes are different.
> Prior to this patch build_sched_domain() reports an error and pretty soon we
> crash in scheduler (don't remember off the top of my head). With patch applied
> I crash mush later, when one of the drivers does kmalloc_node(..,
> cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> ("x86: Booted up 1 node, 32 CPUs" is reported, for example).

Hmm. But the cpu_to_node() association is unrelated to the logical package
management.

> 2-node AMD box doesn't have these problems.
> 
> I haven't upgraded the Intel machine for about a month but this all must have
> happened in 4.9 timeframe.
> 
> So I can't answer your question since we clearly have other problems on Xen. I
> will be looking into this.

Fair enough. What you could do though with this patch applied and the extra
XEN call to topology_update_package_map() removed is to watchout for the
following messages:

  pr_info("Max logical packages: %u\n", __max_logical_packages);

and

  pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

Ideally the latter wont show.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:00 PM, Thomas Gleixner wrote:
> > On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > That should not longer be the case as I changed the approach to that
> > management thing.
> 
> 
> I didn't notice this email before I sent the earlier message.
> 
> Is these anything else besides this patch that I should use? I applied it to
> Linus tree and it didn't apply cleanly (there was some fuzz and such) so I
> wonder whether I am missing something.

No. I did it against tip, but there is nothing which it depends on.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:00 PM, Thomas Gleixner wrote:

On Fri, 9 Dec 2016, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


That should not longer be the case as I changed the approach to that
management thing.



I didn't notice this email before I sent the earlier message.

Is these anything else besides this patch that I should use? I applied 
it to Linus tree and it didn't apply cleanly (there was some fuzz and 
such) so I wonder whether I am missing something.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


Ignore my statement about earlier testing --- it was all on single-node 
machines.


Something is broken with multi-node on Intel, but failure modes are 
different. Prior to this patch build_sched_domain() reports an error and 
pretty soon we crash in scheduler (don't remember off the top of my 
head). With patch applied I crash mush later, when one of the drivers 
does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, 
which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, 
for example).


2-node AMD box doesn't have these problems.

I haven't upgraded the Intel machine for about a month but this all must 
have happened in 4.9 timeframe.


So I can't answer your question since we clearly have other problems on 
Xen. I will be looking into this.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel