Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
On 08/04/2019 14:53, Julien Grall wrote: > Hi Andrew, > > On 4/8/19 1:09 PM, Andrew Cooper wrote: >> On 08/04/2019 12:38, Julien Grall wrote: >>> Hi, >>> >>> On 4/8/19 11:47 AM, Andrew Cooper wrote: On 08/04/2019 11:39, Julien Grall wrote: > Hi, > > On 4/8/19 10:39 AM, Andrew Cooper wrote: >> + case CPU_RESUME_FAILED: >> + if ( !park_offline_cpus && system_state != >> SYS_STATE_suspend ) > > This patch breaks compilation on arm32/arm64 because > park_offline_cpus > is not defined: > > timer.c: In function 'cpu_callback': > timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in > this function) > if ( !park_offline_cpus && system_state != > SYS_STATE_suspend ) > ^ > > What is the purpose of park_offline_cpus? Sorry. I should have waited for a full build test first. park_offline_cpus is a workaround for Intel's MCE behaviour, where the system will shut down rather than deliver an #MC if machine checking isn't configured on all CPUs. As a result, we have to start all CPUs, even beyond maxcpus= and set up machine check handling, and never ever free their stacks, even if we'd prefer the CPUs to be offline. >>> >>> I am a bit confused, why this is necessary now for the timer and not >>> in other places of the common code? >>> Are you happy with a #define park_offline_cpus false > in ARM? >>> >>> The name is fairly confusing if you don't know the background. >>> >>> But I have to admit that even with your explanation above, I still >>> don't understand why you need to check park_offline_cpus in the timers. >> >> It is all to do with how/when we free per-cpu data. >> >> Technically speaking (with the memory leak fixed) the old arrangement >> ought to function correctly, but the new arrangement is more efficient. > Where would the free happen in the "less efficient" way? I don't quite understand the question. https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg02252.html is the v1 patch, but that has already been rejected for not using the up-to-date notifier layout. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
Hi Andrew, On 4/8/19 1:09 PM, Andrew Cooper wrote: On 08/04/2019 12:38, Julien Grall wrote: Hi, On 4/8/19 11:47 AM, Andrew Cooper wrote: On 08/04/2019 11:39, Julien Grall wrote: Hi, On 4/8/19 10:39 AM, Andrew Cooper wrote: + case CPU_RESUME_FAILED: + if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) This patch breaks compilation on arm32/arm64 because park_offline_cpus is not defined: timer.c: In function 'cpu_callback': timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in this function) if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) ^ What is the purpose of park_offline_cpus? Sorry. I should have waited for a full build test first. park_offline_cpus is a workaround for Intel's MCE behaviour, where the system will shut down rather than deliver an #MC if machine checking isn't configured on all CPUs. As a result, we have to start all CPUs, even beyond maxcpus= and set up machine check handling, and never ever free their stacks, even if we'd prefer the CPUs to be offline. I am a bit confused, why this is necessary now for the timer and not in other places of the common code? Are you happy with a #define park_offline_cpus false > in ARM? The name is fairly confusing if you don't know the background. But I have to admit that even with your explanation above, I still don't understand why you need to check park_offline_cpus in the timers. It is all to do with how/when we free per-cpu data. Technically speaking (with the memory leak fixed) the old arrangement ought to function correctly, but the new arrangement is more efficient. Where would the free happen in the "less efficient" way? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
On 08/04/2019 12:38, Julien Grall wrote: > Hi, > > On 4/8/19 11:47 AM, Andrew Cooper wrote: >> On 08/04/2019 11:39, Julien Grall wrote: >>> Hi, >>> >>> On 4/8/19 10:39 AM, Andrew Cooper wrote: + case CPU_RESUME_FAILED: + if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) >>> >>> This patch breaks compilation on arm32/arm64 because park_offline_cpus >>> is not defined: >>> >>> timer.c: In function 'cpu_callback': >>> timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in >>> this function) >>> if ( !park_offline_cpus && system_state != >>> SYS_STATE_suspend ) >>> ^ >>> >>> What is the purpose of park_offline_cpus? >> >> Sorry. I should have waited for a full build test first. >> >> park_offline_cpus is a workaround for Intel's MCE behaviour, where the >> system will shut down rather than deliver an #MC if machine checking >> isn't configured on all CPUs. >> >> As a result, we have to start all CPUs, even beyond maxcpus= and set up >> machine check handling, and never ever free their stacks, even if we'd >> prefer the CPUs to be offline. > > I am a bit confused, why this is necessary now for the timer and not > in other places of the common code? > >> >> Are you happy with a >> >> #define park_offline_cpus false > >> in ARM? > > The name is fairly confusing if you don't know the background. > > But I have to admit that even with your explanation above, I still > don't understand why you need to check park_offline_cpus in the timers. It is all to do with how/when we free per-cpu data. Technically speaking (with the memory leak fixed) the old arrangement ought to function correctly, but the new arrangement is more efficient. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
Hi, On 4/8/19 11:47 AM, Andrew Cooper wrote: On 08/04/2019 11:39, Julien Grall wrote: Hi, On 4/8/19 10:39 AM, Andrew Cooper wrote: + case CPU_RESUME_FAILED: + if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) This patch breaks compilation on arm32/arm64 because park_offline_cpus is not defined: timer.c: In function 'cpu_callback': timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in this function) if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) ^ What is the purpose of park_offline_cpus? Sorry. I should have waited for a full build test first. park_offline_cpus is a workaround for Intel's MCE behaviour, where the system will shut down rather than deliver an #MC if machine checking isn't configured on all CPUs. As a result, we have to start all CPUs, even beyond maxcpus= and set up machine check handling, and never ever free their stacks, even if we'd prefer the CPUs to be offline. I am a bit confused, why this is necessary now for the timer and not in other places of the common code? Are you happy with a #define park_offline_cpus false > in ARM? The name is fairly confusing if you don't know the background. But I have to admit that even with your explanation above, I still don't understand why you need to check park_offline_cpus in the timers. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
On 08/04/2019 11:39, Julien Grall wrote: > Hi, > > On 4/8/19 10:39 AM, Andrew Cooper wrote: >> + case CPU_RESUME_FAILED: >> + if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) > > This patch breaks compilation on arm32/arm64 because park_offline_cpus > is not defined: > > timer.c: In function 'cpu_callback': > timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in > this function) > if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) > ^ > > What is the purpose of park_offline_cpus? Sorry. I should have waited for a full build test first. park_offline_cpus is a workaround for Intel's MCE behaviour, where the system will shut down rather than deliver an #MC if machine checking isn't configured on all CPUs. As a result, we have to start all CPUs, even beyond maxcpus= and set up machine check handling, and never ever free their stacks, even if we'd prefer the CPUs to be offline. Are you happy with a #define park_offline_cpus false in ARM? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
Hi, On 4/8/19 10:39 AM, Andrew Cooper wrote: +case CPU_RESUME_FAILED: +if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) This patch breaks compilation on arm32/arm64 because park_offline_cpus is not defined: timer.c: In function 'cpu_callback': timer.c:651:15: error: 'park_offline_cpus' undeclared (first use in this function) if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) ^ What is the purpose of park_offline_cpus? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/timers: Fix memory leak with cpu unplug/plug
>>> On 08.04.19 at 11:39, wrote: > timer_softirq_action() realloc's itself a larger timer heap whenever > necessary, which includes bootstrapping from the empty dummy_heap. Nothing > ever freed this allocation. > > CPU plug and unplug has the side effect of zeroing the percpu data area, which > clears ts->heap. This in turn causes new timers to be put on the list rather > than the heap, and for timer_softirq_action() to bootstrap itself again. > > This in practice leaks ts->heap every time a CPU is unplugged and replugged. > > Implement free_percpu_timers() which includes freeing ts->heap when > appropriate, and update the notifier callback with the recent cpu parking > logic and free-avoidance across suspend. > > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel