Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

2021-08-18 Thread Dario Faggioli
On Wed, 2021-08-18 at 12:21 +0200, Juergen Gross wrote:
> Fix that by letting get_cpu_idle_time() deal with this case.
> 
> Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core
> scheduling")
> Reported-by: Marek Marczykowski-Górecki
> 
> Signed-off-by: Juergen Gross 
> Tested-by: Marek Marczykowski-Górecki
> 
>
Mmm... This is an interesting one. In fact, this fix is not only
correct, it's also simple, effective and (I guess) easy enough to
backport.

Considering all these things together with the fact that we have an
open issue, I'm going to provide my:

Acked-by: Dario Faggioli 

(and this remains valid with Jan's proposed change done upon
committing.)

That said, in the long run...

> ---
> An alternative way to fix the issue would be to keep the
> sched_resource
> of offline cpus allocated like we already do with idle vcpus and
> units.
> This fix would be more intrusive, but it would avoid similar other
> bugs
> like this one.
> 
... it would be probably interesting to go this route, as it looks both
more consistent and future proof (I mean implement it proactively,
independently of issues... when/if we have time, of course!)

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

2021-08-18 Thread Juergen Gross

On 18.08.21 12:35, Jan Beulich wrote:

On 18.08.2021 12:21, Juergen Gross wrote:

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
Tested-by: Marek Marczykowski-Górecki 


Reviewed-by: Jan Beulich 


--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  struct vcpu_runstate_info state = { 0 };
  const struct vcpu *v = idle_vcpu[cpu];
  
-if ( cpu_online(cpu) && v )

+if ( cpu_online(cpu) && v && get_sched_res(cpu) )
  vcpu_runstate_get(v, );


My earlier question was aiming at getting rid of the (now) middle part
of the condition; I thought this may be okay to do as a secondary change
here. But perhaps you intentionally left it there, so I'm unsure whether
to suggest to make the adjustment while committing (awaiting a
maintainer ack first anyway).


Ah, okay. Yes, I think the test of v being non-NULL can be removed.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

2021-08-18 Thread Jan Beulich
On 18.08.2021 12:21, Juergen Gross wrote:
> With smt=0 during a suspend/resume cycle of the machine the threads
> which have been parked before will briefly come up again. This can
> result in problems e.g. with cpufreq driver being active as this will
> call into get_cpu_idle_time() for a cpu without initialized scheduler
> data.
> 
> Fix that by letting get_cpu_idle_time() deal with this case.
> 
> Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core 
> scheduling")
> Reported-by: Marek Marczykowski-Górecki 
> Signed-off-by: Juergen Gross 
> Tested-by: Marek Marczykowski-Górecki 

Reviewed-by: Jan Beulich 

> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
>  struct vcpu_runstate_info state = { 0 };
>  const struct vcpu *v = idle_vcpu[cpu];
>  
> -if ( cpu_online(cpu) && v )
> +if ( cpu_online(cpu) && v && get_sched_res(cpu) )
>  vcpu_runstate_get(v, );

My earlier question was aiming at getting rid of the (now) middle part
of the condition; I thought this may be okay to do as a secondary change
here. But perhaps you intentionally left it there, so I'm unsure whether
to suggest to make the adjustment while committing (awaiting a
maintainer ack first anyway).

Jan




[PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

2021-08-18 Thread Juergen Gross
With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
Tested-by: Marek Marczykowski-Górecki 
---
An alternative way to fix the issue would be to keep the sched_resource
of offline cpus allocated like we already do with idle vcpus and units.
This fix would be more intrusive, but it would avoid similar other bugs
like this one.
---
 xen/common/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 6d34764d38..9ac1b01ca8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
 struct vcpu_runstate_info state = { 0 };
 const struct vcpu *v = idle_vcpu[cpu];
 
-if ( cpu_online(cpu) && v )
+if ( cpu_online(cpu) && v && get_sched_res(cpu) )
 vcpu_runstate_get(v, );
 
 return state.time[RUNSTATE_running];
-- 
2.26.2