Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Andrew Cooper
On 08/04/2019 14:53, Julien Grall wrote:
> Hi Andrew,
>
> On 4/8/19 1:09 PM, Andrew Cooper wrote:
>> On 08/04/2019 12:38, Julien Grall wrote:
>>> Hi,
>>>
>>> On 4/8/19 11:47 AM, Andrew Cooper wrote:
 On 08/04/2019 11:39, Julien Grall wrote:
> Hi,
>
> On 4/8/19 10:39 AM, Andrew Cooper wrote:
>> +    case CPU_RESUME_FAILED:
>> +    if ( !park_offline_cpus && system_state !=
>> SYS_STATE_suspend )
>
> This patch breaks compilation on arm32/arm64 because
> park_offline_cpus
> is not defined:
>
> timer.c: In function 'cpu_callback':
> timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
> this function)
>    if ( !park_offline_cpus && system_state !=
> SYS_STATE_suspend )
>  ^
>
> What is the purpose of park_offline_cpus?

 Sorry.  I should have waited for a full build test first.

 park_offline_cpus is a workaround for Intel's MCE behaviour, where the
 system will shut down rather than deliver an #MC if machine checking
 isn't configured on all CPUs.

 As a result, we have to start all CPUs, even beyond maxcpus= and
 set up
 machine check handling, and never ever free their stacks, even if we'd
 prefer the CPUs to be offline.
>>>
>>> I am a bit confused, why this is necessary now for the timer and not
>>> in other places of the common code?
>>>

 Are you happy with a

 #define park_offline_cpus false >
 in ARM?
>>>
>>> The name is fairly confusing if you don't know the background.
>>>
>>> But I have to admit that even with your explanation above, I still
>>> don't understand why you need to check park_offline_cpus in the timers.
>>
>> It is all to do with how/when we free per-cpu data.
>>
>> Technically speaking (with the memory leak fixed) the old arrangement
>> ought to function correctly, but the new arrangement is more efficient.
> Where would the free happen in the "less efficient" way?

I don't quite understand the question.

https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg02252.html
is the v1 patch, but that has already been rejected for not using the
up-to-date notifier layout.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Julien Grall

Hi Andrew,

On 4/8/19 1:09 PM, Andrew Cooper wrote:

On 08/04/2019 12:38, Julien Grall wrote:

Hi,

On 4/8/19 11:47 AM, Andrew Cooper wrote:

On 08/04/2019 11:39, Julien Grall wrote:

Hi,

On 4/8/19 10:39 AM, Andrew Cooper wrote:

+    case CPU_RESUME_FAILED:
+    if ( !park_offline_cpus && system_state !=
SYS_STATE_suspend )


This patch breaks compilation on arm32/arm64 because park_offline_cpus
is not defined:

timer.c: In function 'cpu_callback':
timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
this function)
   if ( !park_offline_cpus && system_state !=
SYS_STATE_suspend )
     ^

What is the purpose of park_offline_cpus?


Sorry.  I should have waited for a full build test first.

park_offline_cpus is a workaround for Intel's MCE behaviour, where the
system will shut down rather than deliver an #MC if machine checking
isn't configured on all CPUs.

As a result, we have to start all CPUs, even beyond maxcpus= and set up
machine check handling, and never ever free their stacks, even if we'd
prefer the CPUs to be offline.


I am a bit confused, why this is necessary now for the timer and not
in other places of the common code?



Are you happy with a

#define park_offline_cpus false >
in ARM?


The name is fairly confusing if you don't know the background.

But I have to admit that even with your explanation above, I still
don't understand why you need to check park_offline_cpus in the timers.


It is all to do with how/when we free per-cpu data.

Technically speaking (with the memory leak fixed) the old arrangement
ought to function correctly, but the new arrangement is more efficient.

Where would the free happen in the "less efficient" way?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Andrew Cooper
On 08/04/2019 12:38, Julien Grall wrote:
> Hi,
>
> On 4/8/19 11:47 AM, Andrew Cooper wrote:
>> On 08/04/2019 11:39, Julien Grall wrote:
>>> Hi,
>>>
>>> On 4/8/19 10:39 AM, Andrew Cooper wrote:
 +    case CPU_RESUME_FAILED:
 +    if ( !park_offline_cpus && system_state !=
 SYS_STATE_suspend )
>>>
>>> This patch breaks compilation on arm32/arm64 because park_offline_cpus
>>> is not defined:
>>>
>>> timer.c: In function 'cpu_callback':
>>> timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
>>> this function)
>>>   if ( !park_offline_cpus && system_state !=
>>> SYS_STATE_suspend )
>>>     ^
>>>
>>> What is the purpose of park_offline_cpus?
>>
>> Sorry.  I should have waited for a full build test first.
>>
>> park_offline_cpus is a workaround for Intel's MCE behaviour, where the
>> system will shut down rather than deliver an #MC if machine checking
>> isn't configured on all CPUs.
>>
>> As a result, we have to start all CPUs, even beyond maxcpus= and set up
>> machine check handling, and never ever free their stacks, even if we'd
>> prefer the CPUs to be offline.
>
> I am a bit confused, why this is necessary now for the timer and not
> in other places of the common code?
>
>>
>> Are you happy with a
>>
>> #define park_offline_cpus false >
>> in ARM?
>
> The name is fairly confusing if you don't know the background.
>
> But I have to admit that even with your explanation above, I still
> don't understand why you need to check park_offline_cpus in the timers.

It is all to do with how/when we free per-cpu data.

Technically speaking (with the memory leak fixed) the old arrangement
ought to function correctly, but the new arrangement is more efficient.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Julien Grall

Hi,

On 4/8/19 11:47 AM, Andrew Cooper wrote:

On 08/04/2019 11:39, Julien Grall wrote:

Hi,

On 4/8/19 10:39 AM, Andrew Cooper wrote:

+    case CPU_RESUME_FAILED:
+    if ( !park_offline_cpus && system_state != SYS_STATE_suspend )


This patch breaks compilation on arm32/arm64 because park_offline_cpus
is not defined:

timer.c: In function 'cpu_callback':
timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
this function)
  if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
    ^

What is the purpose of park_offline_cpus?


Sorry.  I should have waited for a full build test first.

park_offline_cpus is a workaround for Intel's MCE behaviour, where the
system will shut down rather than deliver an #MC if machine checking
isn't configured on all CPUs.

As a result, we have to start all CPUs, even beyond maxcpus= and set up
machine check handling, and never ever free their stacks, even if we'd
prefer the CPUs to be offline.


I am a bit confused, why this is necessary now for the timer and not in 
other places of the common code?




Are you happy with a

#define park_offline_cpus false >
in ARM?


The name is fairly confusing if you don't know the background.

But I have to admit that even with your explanation above, I still don't 
understand why you need to check park_offline_cpus in the timers.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Andrew Cooper
On 08/04/2019 11:39, Julien Grall wrote:
> Hi,
>
> On 4/8/19 10:39 AM, Andrew Cooper wrote:
>> +    case CPU_RESUME_FAILED:
>> +    if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
>
> This patch breaks compilation on arm32/arm64 because park_offline_cpus
> is not defined:
>
> timer.c: In function 'cpu_callback':
> timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in
> this function)
>  if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
>    ^
>
> What is the purpose of park_offline_cpus?

Sorry.  I should have waited for a full build test first.

park_offline_cpus is a workaround for Intel's MCE behaviour, where the
system will shut down rather than deliver an #MC if machine checking
isn't configured on all CPUs.

As a result, we have to start all CPUs, even beyond maxcpus= and set up
machine check handling, and never ever free their stacks, even if we'd
prefer the CPUs to be offline.

Are you happy with a

#define park_offline_cpus false

in ARM?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Julien Grall

Hi,

On 4/8/19 10:39 AM, Andrew Cooper wrote:

+case CPU_RESUME_FAILED:
+if ( !park_offline_cpus && system_state != SYS_STATE_suspend )


This patch breaks compilation on arm32/arm64 because park_offline_cpus 
is not defined:


timer.c: In function 'cpu_callback':
timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in this 
function)

 if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
   ^

What is the purpose of park_offline_cpus?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug

2019-04-08 Thread Jan Beulich
>>> On 08.04.19 at 11:39,  wrote:
> timer_softirq_action() realloc's itself a larger timer heap whenever
> necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
> ever freed this allocation.
> 
> CPU plug and unplug has the side effect of zeroing the percpu data area, which
> clears ts->heap.  This in turn causes new timers to be put on the list rather
> than the heap, and for timer_softirq_action() to bootstrap itself again.
> 
> This in practice leaks ts->heap every time a CPU is unplugged and replugged.
> 
> Implement free_percpu_timers() which includes freeing ts->heap when
> appropriate, and update the notifier callback with the recent cpu parking
> logic and free-avoidance across suspend.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel