subject:"Null scheduler and vwfi native problem"

Re: Null scheduler and vwfi native problem

2021-02-14 Thread Anders Törnqvist


On 1/30/21 6:59 PM, Dario Faggioli wrote:

On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:

On 1/26/21 11:31 PM, Dario Faggioli wrote:

Thanks again for letting us see these logs.

Thanks for the attention to this :-)

Any ideas for how to solve it?


So, you're up for testing patches, right?

How about applying these two, and letting me know what happens? :-D


Great work guys!

Hi. Now I got the time to test the patches.

They was not possible to apply without fail on the code version I am 
using which is commit b64b8df622963accf85b227e468fe12b2d56c128 from 
https://source.codeaurora.org/external/imx/imx-xen.


I did some editing to get them into my code. I think I should have 
removed some sched_tick_suspend/sched_tick_resume calls also.

See the attached patches for what I have applied on the code.

Anyway, after applying the patches including the original 
rcu-quiesc-patch.patch the destroy of the domu seems to work.
I have rebooted, only destroyed-created and used Xen watchdog to reboot 
the domu in total about 20 times and so far it has nicely destroyed and 
the been able to start a new instance of the domu.


So it looks promising although my edited patches probably need some fixing.




They are on top of current staging. I can try to rebase on something
else, if it's easier for you to test.

Besides being attached, they're also available here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix

I could not test them properly on ARM, as I don't have an ARM system
handy, so everything is possible really... just let me know.

It should at least build fine, AFAICT from here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213

Julien, back in:

  
https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/


you said I should hook in enter_hypervisor_head(),
leave_hypervisor_tail(). Those functions are gone now and looking at
how the code changed, this is where I figured I should put the calls
(see the second patch). But feel free to educate me otherwise.

For x86 people that are listening... Do we have, in our beloved arch,
equally handy places (i.e., right before leaving Xen for a guest and
right after entering Xen from one), preferrably in a C file, and for
all guests... like it seems to be the case on ARM?

Regards



diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index d6dc4b48db..42ab9dbbd6 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -52,8 +52,8 @@ static struct rcu_ctrlblk {
 int  next_pending;  /* Is the next batch already waiting? */
 
 spinlock_t  lock __cacheline_aligned;
-cpumask_t   cpumask; /* CPUs that need to switch in order ... */
-cpumask_t   idle_cpumask; /* ... unless they are already idle */
+cpumask_t   cpumask; /* CPUs that need to switch in order ...   */
+cpumask_t   ignore_cpumask; /* ... unless they are already idle */
 /* for current batch to proceed.*/
 } __cacheline_aligned rcu_ctrlblk = {
 .cur = -300,
@@ -86,8 +86,8 @@ struct rcu_data {
 longlast_rs_qlen; /* qlen during the last resched */
 
 /* 3) idle CPUs handling */
-struct timer idle_timer;
-bool idle_timer_active;
+struct timer cb_timer;
+bool cb_timer_active;
 };
 
 /*
@@ -116,22 +116,22 @@ struct rcu_data {
  * CPU that is going idle. The user can change this, via a boot time
  * parameter, but only up to 100ms.
  */
-#define IDLE_TIMER_PERIOD_MAX MILLISECS(100)
-#define IDLE_TIMER_PERIOD_DEFAULT MILLISECS(10)
-#define IDLE_TIMER_PERIOD_MIN MICROSECS(100)
+#define CB_TIMER_PERIOD_MAX MILLISECS(100)
+#define CB_TIMER_PERIOD_DEFAULT MILLISECS(10)
+#define CB_TIMER_PERIOD_MIN MICROSECS(100)
 
-static s_time_t __read_mostly idle_timer_period;
+static s_time_t __read_mostly cb_timer_period;
 
 /*
- * Increment and decrement values for the idle timer handler. The algorithm
+ * Increment and decrement values for the callback timer handler. The algorithm
  * works as follows:
  * - if the timer actually fires, and it finds out that the grace period isn't
- *   over yet, we add IDLE_TIMER_PERIOD_INCR to the timer's period;
+ *   over yet, we add CB_TIMER_PERIOD_INCR to the timer's period;
  * - if the timer actually fires and it finds the grace period over, we
  *   subtract IDLE_TIMER_PERIOD_DECR from the timer's period.
  */
-#define IDLE_TIMER_PERIOD_INCRMILLISECS(10)
-#define IDLE_TIMER_PERIOD_DECRMICROSECS(100)
+#define CB_TIMER_PERIOD_INCRMILLISECS(10)
+#define CB_TIMER_PERIOD_DECRMICROSECS(100)
 
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
@@ -309,7 +309,7 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp)
 * This barrier is paired with the one in rcu_idle_enter().
 */
 smp_mb();
-cpumask_andnot(>cpumask, _online_map, >idle_cpumask);
+cpumask_andnot(>cpumask, _online_map, >ignore_cpumask);
 }
 }
 
@@ -455,7 +455,7 @@ int

Re: Null scheduler and vwfi native problem

2021-02-03 Thread Jürgen Groß


On 03.02.21 12:20, Julien Grall wrote:

Hi Juergen,

On 03/02/2021 11:00, Jürgen Groß wrote:

On 03.02.21 10:19, Julien Grall wrote:

Hi,

On 03/02/2021 07:31, Dario Faggioli wrote:

On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote:

In reality, it is probably still too early as a pCPU can be
considered
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).


Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).


Oh, that's not what I meant. This will indeed be far more complex 
than I originally had in mind.


AFAIU, the RCU uses critical section to protect data. So the 
"entering" could be used as "the pCPU is not quiesced" and "exiting" 
could be used as "the pCPU is quiesced".


The concern with my approach is we would need to make sure that Xen 
correctly uses the rcu helpers. I know Juergen worked on that 
recently, but I don't know whether this is fully complete.


I think it is complete, but I can't be sure, of course.

One bit missing (for catching some wrong uses of the helpers) is this
patch:

https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html

I don't remember why it hasn't been taken, but I think there was a
specific reason for that.


Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit 
puzzled to why this wasn't committed... I had to go to v6 and notice the 
following message:


"albeit to be honest I'm not fully convinced we need to go this far."

Was the implication that his reviewed-by was conditional to someone else 
answering to the e-mail?


I have no record for that being the case.

Patches 1-3 of that series were needed for getting rid of
stop_machine_run() in rcu handling and to fix other problems. Patch 4
was adding some additional ASSERT()s for making sure no potential
deadlocks due to wrong rcu usage could creep in again.

Patch 5 was more a "nice to have" addition in order to avoid any
wrong usage of rcu which should have no real negative impact on the
system stability.

So I believe Jan as the committer didn't want to commit it himself, but
was fine with the overall idea and implementation.

I still think for code sanity it would be nice, but I was rather busy
with Xenstore and event channel security work at that time, so I didn't
urge anyone to take this patch.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: Null scheduler and vwfi native problem

2021-02-03 Thread Julien Grall


Hi Juergen,

On 03/02/2021 11:00, Jürgen Groß wrote:

On 03.02.21 10:19, Julien Grall wrote:

Hi,

On 03/02/2021 07:31, Dario Faggioli wrote:

On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote:

In reality, it is probably still too early as a pCPU can be
considered
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).


Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).


Oh, that's not what I meant. This will indeed be far more complex than 
I originally had in mind.


AFAIU, the RCU uses critical section to protect data. So the 
"entering" could be used as "the pCPU is not quiesced" and "exiting" 
could be used as "the pCPU is quiesced".


The concern with my approach is we would need to make sure that Xen 
correctly uses the rcu helpers. I know Juergen worked on that 
recently, but I don't know whether this is fully complete.


I think it is complete, but I can't be sure, of course.

One bit missing (for catching some wrong uses of the helpers) is this
patch:

https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html

I don't remember why it hasn't been taken, but I think there was a
specific reason for that.


Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit 
puzzled to why this wasn't committed... I had to go to v6 and notice the 
following message:


"albeit to be honest I'm not fully convinced we need to go this far."

Was the implication that his reviewed-by was conditional to someone else 
answering to the e-mail?


Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-02-03 Thread Jürgen Groß


On 03.02.21 10:19, Julien Grall wrote:

Hi,

On 03/02/2021 07:31, Dario Faggioli wrote:

On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote:

In reality, it is probably still too early as a pCPU can be
considered
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).


Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).


Oh, that's not what I meant. This will indeed be far more complex than I 
originally had in mind.


AFAIU, the RCU uses critical section to protect data. So the "entering" 
could be used as "the pCPU is not quiesced" and "exiting" could be used 
as "the pCPU is quiesced".


The concern with my approach is we would need to make sure that Xen 
correctly uses the rcu helpers. I know Juergen worked on that recently, 
but I don't know whether this is fully complete.


I think it is complete, but I can't be sure, of course.

One bit missing (for catching some wrong uses of the helpers) is this
patch:

https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html

I don't remember why it hasn't been taken, but I think there was a
specific reason for that.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: Null scheduler and vwfi native problem

2021-02-03 Thread Julien Grall


Hi,

On 03/02/2021 07:31, Dario Faggioli wrote:

On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote:

In reality, it is probably still too early as a pCPU can be
considered
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).


Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).


Oh, that's not what I meant. This will indeed be far more complex than I 
originally had in mind.


AFAIU, the RCU uses critical section to protect data. So the "entering" 
could be used as "the pCPU is not quiesced" and "exiting" could be used 
as "the pCPU is quiesced".


The concern with my approach is we would need to make sure that Xen 
correctly uses the rcu helpers. I know Juergen worked on that recently, 
but I don't know whether this is fully complete.


Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-02-02 Thread Dario Faggioli

Hi again, 

On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote:
> (Adding Andrew, Jan, Juergen for visibility)
> 
Thanks! :-)

> On 02/02/2021 15:03, Dario Faggioli wrote:
> > On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote:
> > > The placement in enter_hypervisor_from_guest() doesn't matter too
> > > much,
> > > although I would consider to call it as a late as possible.
> > > 
> > Mmmm... Can I ask why? In fact, I would have said "as soon as
> > possible".
> 
> Because those functions only access data for the current vCPU/domain.
> This is already protected by the fact that the domain is running.
> 
Mmm.. ok, yes, I think it makes sense.

> By leaving the "quiesce" mode later, you give an opportunity to the
> RCU 
> to release memory earlier.
> 
Yeah. What I wanted to be sure is that we put the CPU "back in the
race" :-) before any current or future use of RCUs.

> In reality, it is probably still too early as a pCPU can be
> considered 
> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
> 
Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).

To me, however, this looks indeed too complex and difficult to
maintain, not only for 4.15 but in general. E.g., suppose we find such
a use of RCUs in function foo() called by bar() called by
hypervisor_enter_from_guest().

If someone at some points wants to use RCUs in bar(), how does she know
that she should also move the call to rcu_quiet_enter() from foo() to
there?

So, yes, I'll move it a little down, but still within
hypervisor_enter_from_guest().

In the meanwhile, I had a quick chat with Jourgen about x86. In fact, I
had a look and was not finding a place where to put the
rcu_quiet_{exit,enter}() calls as convenient as you have here on ARM.
I.e., two nice C functions that we traverse for all kind of guests, for
HVM and SVM, etc.

Actually, I was quite skeptical about it but, you know, one can hope!
Juergen confirmed that there's no such things, so I'll look at the
various entry.S files for the proper spots.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)

signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-02-02 Thread Julien Grall


(Adding Andrew, Jan, Juergen for visibility)

Hi Dario,

On 02/02/2021 15:03, Dario Faggioli wrote:

On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote:

Hi Dario,

I have had a quick look at your place. The RCU call in
leave_hypervisor_to_guest() needs to be placed just after the last
call
to check_for_pcpu_work().

Otherwise, you may be preempted and keep the RCU quiet.


Ok, makes sense. I'll move it.


The placement in enter_hypervisor_from_guest() doesn't matter too
much,
although I would consider to call it as a late as possible.


Mmmm... Can I ask why? In fact, I would have said "as soon as
possible".


Because those functions only access data for the current vCPU/domain. 
This is already protected by the fact that the domain is running.


By leaving the "quiesce" mode later, you give an opportunity to the RCU 
to release memory earlier.


In reality, it is probably still too early as a pCPU can be considered 
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).


But this would require some investigation to check if we effectively 
protect all the region with the RCU helpers. This is likely too 
complicated for 4.15.


Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-02-02 Thread Dario Faggioli

On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote:
> Hi Dario,
> 
Hi!

> I have had a quick look at your place. The RCU call in 
> leave_hypervisor_to_guest() needs to be placed just after the last
> call 
> to check_for_pcpu_work().
> 
> Otherwise, you may be preempted and keep the RCU quiet.
> 
Ok, makes sense. I'll move it.

> The placement in enter_hypervisor_from_guest() doesn't matter too
> much, 
> although I would consider to call it as a late as possible.
> 
Mmmm... Can I ask why? In fact, I would have said "as soon as
possible".

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-02-01 Thread Julien Grall


Hi Dario,

On 30/01/2021 17:59, Dario Faggioli wrote:

On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:

On 1/26/21 11:31 PM, Dario Faggioli wrote:

Thanks again for letting us see these logs.


Thanks for the attention to this :-)

Any ideas for how to solve it?


So, you're up for testing patches, right?

How about applying these two, and letting me know what happens? :-D

They are on top of current staging. I can try to rebase on something
else, if it's easier for you to test.

Besides being attached, they're also available here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix

I could not test them properly on ARM, as I don't have an ARM system
handy, so everything is possible really... just let me know.

It should at least build fine, AFAICT from here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213

Julien, back in:

  
https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/


you said I should hook in enter_hypervisor_head(),
leave_hypervisor_tail(). Those functions are gone now and looking at
how the code changed, this is where I figured I should put the calls
(see the second patch). But feel free to educate me otherwise.


enter_hypervisor_from_guest() and leave_hypervisor_to_guest() are the 
new functions.


I have had a quick look at your place. The RCU call in 
leave_hypervisor_to_guest() needs to be placed just after the last call 
to check_for_pcpu_work().


Otherwise, you may be preempted and keep the RCU quiet.

The placement in enter_hypervisor_from_guest() doesn't matter too much, 
although I would consider to call it as a late as possible.


Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-31 Thread Anders Törnqvist


On 1/30/21 6:59 PM, Dario Faggioli wrote:

On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:

On 1/26/21 11:31 PM, Dario Faggioli wrote:

Thanks again for letting us see these logs.

Thanks for the attention to this :-)

Any ideas for how to solve it?


So, you're up for testing patches, right?

Absolutely. I will apply them and be back with the results. :-)



How about applying these two, and letting me know what happens? :-D

They are on top of current staging. I can try to rebase on something
else, if it's easier for you to test.

Besides being attached, they're also available here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix

I could not test them properly on ARM, as I don't have an ARM system
handy, so everything is possible really... just let me know.

It should at least build fine, AFAICT from here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213

Julien, back in:

  
https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/


you said I should hook in enter_hypervisor_head(),
leave_hypervisor_tail(). Those functions are gone now and looking at
how the code changed, this is where I figured I should put the calls
(see the second patch). But feel free to educate me otherwise.

For x86 people that are listening... Do we have, in our beloved arch,
equally handy places (i.e., right before leaving Xen for a guest and
right after entering Xen from one), preferrably in a C file, and for
all guests... like it seems to be the case on ARM?

Regards

Re: Null scheduler and vwfi native problem

2021-01-31 Thread Anders Törnqvist


On 1/29/21 11:16 AM, Dario Faggioli wrote:

On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote:

On 29.01.21 09:08, Anders Törnqvist wrote:

So it using it has only downsides (and that's true in general, if
you
ask me, but particularly so if using NULL).

Thanks for the feedback.
I removed dom0_vcpus_pin. And, as you said, it seems to be
unrelated to
the problem we're discussing.

Right. Don't put it back, and stay away from it, if you accept an
advice. :-)


The system still behaves the same.


Yeah, that was expected.


When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:

Name    ID  VCPU   CPU State Time(s)
Affinity (Hard / Soft)
Domain-0 0 0    0   r--  29.4
all / all
Domain-0 0 1    1   r--  28.7
all / all
Domain-0 0 2    2   r--  28.7
all / all
Domain-0 0 3    3   r--  28.6
all / all
Domain-0 0 4    4   r--  28.6
all / all
mydomu      1 0    5   r--  21.6 5
/ all


Right, and it makes sense for it to look like this.


  From this listing (with "all" as hard affinity for dom0) one might
read
it like dom0 is not pinned with hard affinity to any specific pCPUs
at
all but mudomu is pinned to pCPU 5.
Will the dom0_max_vcpus=5 in this case guarantee that dom0 only
will run
on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?

No.


Well, yes... if you use the NULL scheduler. Which is in use here. :-)

Basically, the NULL scheduler _always_ assign one and only one vCPU to
each pCPU. This happens at domain (well, at the vCPU) creation time.
And it _never_ move a vCPU away from the pCPU to which it has assigned
it.

And it also _never_ change this vCPU-->pCPU assignment/relationship,
unless some special event happens (such as, either the vCPU and/or the
pCPU goes offline, is removed from the cpupool, you change the affinity
[as I'll explain below], etc).

This is the NULL scheduler's mission and only job, so it does that by
default, _without_ any need for an affinity to be specified.

So, how can affinity be useful in the NULL scheduler? Well, it's useful
if you want to control and decide to what pCPU a certain vCPU should
go.

So, let's make an example. Let's say you are in this situation:

NameID  VCPU   CPU State Time(s) Affinity (Hard 
/ Soft)
Domain-0 0 00   r-- 29.4   all / all
Domain-0 0 11   r-- 28.7   all / all
Domain-0 0 22   r-- 28.7   all / all
Domain-0 0 33   r-- 28.6   all / all
Domain-0 0 44   r-- 28.6   all / all

I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are
not using dom0_vcpus_pin.

The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU
that can run on pCPU 0, despite its affinities being "all"... because
it's what the NULL scheduler does for you and it's the reason why one
uses it! :-)

Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3
and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above
for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as
well.

With no affinity being specified, which vCPU is assigned to which pCPU
is entirely under the NULL scheduler control. It has its heuristics
inside, to try to do that in a smart way, but that's an
internal/implementation detail and is not relevant here.

If you now create a domU with 1 vCPU, that vCPU will be assigned to
pCPU 5.

Now, let's say that, for whatever reason, you absolutely want that d0v2
to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is
what the NULL scheduler decided to pick for it). Well, what you do is
use xl, set the affinity of d0v2 to pCPU 5, and you will get something
like this as a result:

NameID  VCPU   CPU State Time(s) Affinity (Hard 
/ Soft)
Domain-0 0 00   r-- 29.4   all / all
Domain-0 0 11   r-- 28.7   all / all
Domain-0 0 25   r-- 28.7 5 / all
Domain-0 0 33   r-- 28.6   all / all
Domain-0 0 44   r-- 28.6   all / all

So, affinity is indeed useful, even when using NULL, if you want to
diverge from the default behavior and enact a certain policy, maybe due
to the nature of your workload, the characteristics of your hardware,
or whatever.

It is not, however, necessary to set the affinity to:
  - have a vCPU to always stay on one --and always the same one too--
pCPU;
  - avoid that any other vCPU would ever run on that pCPU.

That is guaranteed by the NULL scheduler itself. It just

Re: Null scheduler and vwfi native problem

2021-01-30 Thread Dario Faggioli

On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:
> On 1/26/21 11:31 PM, Dario Faggioli wrote:
> > Thanks again for letting us see these logs.
> 
> Thanks for the attention to this :-)
> 
> Any ideas for how to solve it?
> 
So, you're up for testing patches, right?

How about applying these two, and letting me know what happens? :-D

They are on top of current staging. I can try to rebase on something
else, if it's easier for you to test.

Besides being attached, they're also available here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix

I could not test them properly on ARM, as I don't have an ARM system
handy, so everything is possible really... just let me know.

It should at least build fine, AFAICT from here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213

Julien, back in:

 https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/


you said I should hook in enter_hypervisor_head(),
leave_hypervisor_tail(). Those functions are gone now and looking at
how the code changed, this is where I figured I should put the calls
(see the second patch). But feel free to educate me otherwise.

For x86 people that are listening... Do we have, in our beloved arch,
equally handy places (i.e., right before leaving Xen for a guest and
right after entering Xen from one), preferrably in a C file, and for
all guests... like it seems to be the case on ARM?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)
commit 2c38754fa73a81e8dfab8abdfb18b9896e00
Author: Dario Faggioli 
Date:   Sat Jan 30 07:50:22 2021 +

xen: rename RCU idle timer and cpumask

Both the cpumask and the timer will be used in more generic
circumnstances, not only for CPUs that go idle. Change their names to
reflect that.

No functional change.

Signed-off-by: Dario Faggioli 

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3de..e0bf842f13 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -55,8 +55,8 @@ static struct rcu_ctrlblk {
 int  next_pending;  /* Is the next batch already waiting? */
 
 spinlock_t  lock __cacheline_aligned;
-cpumask_t   cpumask; /* CPUs that need to switch in order ... */
-cpumask_t   idle_cpumask; /* ... unless they are already idle */
+cpumask_t   cpumask; /* CPUs that need to switch in order ...   */
+cpumask_t   ignore_cpumask; /* ... unless they are already idle */
 /* for current batch to proceed.*/
 } __cacheline_aligned rcu_ctrlblk = {
 .cur = -300,
@@ -88,8 +88,8 @@ struct rcu_data {
 longlast_rs_qlen; /* qlen during the last resched */
 
 /* 3) idle CPUs handling */
-struct timer idle_timer;
-bool idle_timer_active;
+struct timer cb_timer;
+bool cb_timer_active;
 
 boolprocess_callbacks;
 boolbarrier_active;
@@ -121,22 +121,22 @@ struct rcu_data {
  * CPU that is going idle. The user can change this, via a boot time
  * parameter, but only up to 100ms.
  */
-#define IDLE_TIMER_PERIOD_MAX MILLISECS(100)
-#define IDLE_TIMER_PERIOD_DEFAULT MILLISECS(10)
-#define IDLE_TIMER_PERIOD_MIN MICROSECS(100)
+#define CB_TIMER_PERIOD_MAX MILLISECS(100)
+#define CB_TIMER_PERIOD_DEFAULT MILLISECS(10)
+#define CB_TIMER_PERIOD_MIN MICROSECS(100)
 
-static s_time_t __read_mostly idle_timer_period;
+static s_time_t __read_mostly cb_timer_period;
 
 /*
- * Increment and decrement values for the idle timer handler. The algorithm
+ * Increment and decrement values for the callback timer handler. The algorithm
  * works as follows:
  * - if the timer actually fires, and it finds out that the grace period isn't
- *   over yet, we add IDLE_TIMER_PERIOD_INCR to the timer's period;
+ *   over yet, we add CB_TIMER_PERIOD_INCR to the timer's period;
  * - if the timer actually fires and it finds the grace period over, we
  *   subtract IDLE_TIMER_PERIOD_DECR from the timer's period.
  */
-#define IDLE_TIMER_PERIOD_INCRMILLISECS(10)
-#define IDLE_TIMER_PERIOD_DECRMICROSECS(100)
+#define CB_TIMER_PERIOD_INCRMILLISECS(10)
+#define CB_TIMER_PERIOD_DECRMICROSECS(100)
 
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
@@ -364,7 +364,7 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp)
 * This barrier is paired with the one in rcu_idle_enter().
 */
 smp_mb();
-cpumask_andnot(>cpumask, _online_map, >idle_cpumask);
+cpumask_andnot(>cpumask, _online_map, >ignore_cpumask);
 }
 }
 
@@ -523,7 +523,7 @@ int rcu_needs_cpu(int cpu)
 {
 struct rcu_data *rdp = _cpu(rcu_data, cpu);
 
-return (rdp->curlist && !rdp->idle_timer_active) || rcu_pending(cpu);
+return (rdp->curlist && !rdp->cb_timer_active) ||

Re: Null scheduler and vwfi native problem

2021-01-29 Thread Dario Faggioli

On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote:
> On 29.01.21 09:08, Anders Törnqvist wrote:
> > > 
> > > So it using it has only downsides (and that's true in general, if
> > > you
> > > ask me, but particularly so if using NULL).
> > Thanks for the feedback.
> > I removed dom0_vcpus_pin. And, as you said, it seems to be
> > unrelated to 
> > the problem we're discussing. 
>
Right. Don't put it back, and stay away from it, if you accept an
advice. :-)

> > The system still behaves the same.
> > 
Yeah, that was expected.

> > When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:
> > 
> > Name    ID  VCPU   CPU State Time(s) 
> > Affinity (Hard / Soft)
> > Domain-0 0 0    0   r--  29.4
> > all / all
> > Domain-0 0 1    1   r--  28.7
> > all / all
> > Domain-0 0 2    2   r--  28.7
> > all / all
> > Domain-0 0 3    3   r--  28.6
> > all / all
> > Domain-0 0 4    4   r--  28.6
> > all / all
> > mydomu      1 0    5   r--  21.6 5
> > / all
> > 
Right, and it makes sense for it to look like this.

> >  From this listing (with "all" as hard affinity for dom0) one might
> > read 
> > it like dom0 is not pinned with hard affinity to any specific pCPUs
> > at 
> > all but mudomu is pinned to pCPU 5.
> > Will the dom0_max_vcpus=5 in this case guarantee that dom0 only
> > will run 
> > on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?
> 
> No.
>
Well, yes... if you use the NULL scheduler. Which is in use here. :-)

Basically, the NULL scheduler _always_ assign one and only one vCPU to
each pCPU. This happens at domain (well, at the vCPU) creation time.
And it _never_ move a vCPU away from the pCPU to which it has assigned
it.

And it also _never_ change this vCPU-->pCPU assignment/relationship,
unless some special event happens (such as, either the vCPU and/or the
pCPU goes offline, is removed from the cpupool, you change the affinity
[as I'll explain below], etc).

This is the NULL scheduler's mission and only job, so it does that by
default, _without_ any need for an affinity to be specified.

So, how can affinity be useful in the NULL scheduler? Well, it's useful
if you want to control and decide to what pCPU a certain vCPU should
go.

So, let's make an example. Let's say you are in this situation:

NameID  VCPU   CPU State Time(s) Affinity (Hard 
/ Soft)
Domain-0 0 00   r-- 29.4   all / all
Domain-0 0 11   r-- 28.7   all / all
Domain-0 0 22   r-- 28.7   all / all
Domain-0 0 33   r-- 28.6   all / all
Domain-0 0 44   r-- 28.6   all / all

I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are
not using dom0_vcpus_pin.

The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU
that can run on pCPU 0, despite its affinities being "all"... because
it's what the NULL scheduler does for you and it's the reason why one
uses it! :-)

Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3
and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above
for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as
well.

With no affinity being specified, which vCPU is assigned to which pCPU
is entirely under the NULL scheduler control. It has its heuristics
inside, to try to do that in a smart way, but that's an
internal/implementation detail and is not relevant here.

If you now create a domU with 1 vCPU, that vCPU will be assigned to
pCPU 5.

Now, let's say that, for whatever reason, you absolutely want that d0v2
to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is
what the NULL scheduler decided to pick for it). Well, what you do is
use xl, set the affinity of d0v2 to pCPU 5, and you will get something
like this as a result:

NameID  VCPU   CPU State Time(s) Affinity (Hard 
/ Soft)
Domain-0 0 00   r-- 29.4   all / all
Domain-0 0 11   r-- 28.7   all / all
Domain-0 0 25   r-- 28.7 5 / all
Domain-0 0 33   r-- 28.6   all / all
Domain-0 0 44   r-- 28.6   all / all

So, affinity is indeed useful, even when using NULL, if you want to
diverge from the default behavior and enact a certain policy, maybe due
to the nature of your workload, the characteristics of your hardware,
or whatever.

It is not, however, necessary to set the affinity to:
 - have a vCPU to always stay on one --and always the same one too-- 
   pCPU;
 - avoid

Re: Null scheduler and vwfi native problem

2021-01-29 Thread Jürgen Groß


On 29.01.21 09:08, Anders Törnqvist wrote:

On 1/26/21 11:31 PM, Dario Faggioli wrote:

On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:

On 1/25/21 5:11 PM, Dario Faggioli wrote:

On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:

Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above
"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started
correct.
"xl list" look like this for the domain:
mydomu   2   512 1 --
0.0

This is odd. I would have expected ``xl create`` to fail if
something
went wrong with the domain creation.


So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
   - with Julien's patch (the one he sent the other day, and that
you
 have already given a try to) applied
   - while you are in the state above, i.e., after having tried to
 destroy a domain and failing
   - and maybe again after having tried to start a new domain

Here are some logs.


Great, thanks a lot!


The system is booted as before with the patch and the domu config
does
not have the IRQs.


Ok.


# xl list
Name    ID   Mem VCPUs State
Time(s)
Domain-0 0  3000 5 r-
820.1
mydomu   1   511 1 r-
157.0

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)       1: [0.0] pcpu=0
(XEN)       2: [0.1] pcpu=1
(XEN)       3: [0.2] pcpu=2
(XEN)       4: [0.3] pcpu=3
(XEN)       5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)       6: [1.0] pcpu=5
(XEN) Waitqueue:


So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.


(XEN) Command line: console=dtuart dtuart=/serial@5a06
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
sched=null vwfi=native


Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.

The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.

So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).

Thanks for the feedback.
I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to 
the problem we're discussing. The system still behaves the same.


When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:

Name    ID  VCPU   CPU State Time(s) 
Affinity (Hard / Soft)

Domain-0 0 0    0   r--  29.4 all / all
Domain-0 0 1    1   r--  28.7 all / all
Domain-0 0 2    2   r--  28.7 all / all
Domain-0 0 3    3   r--  28.6 all / all
Domain-0 0 4    4   r--  28.6 all / all
mydomu      1 0    5   r--  21.6 5 / all

 From this listing (with "all" as hard affinity for dom0) one might read 
it like dom0 is not pinned with hard affinity to any specific pCPUs at 
all but mudomu is pinned to pCPU 5.
Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run 
on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?


No.



What if I would like mydomu to be th only domain that uses pCPU 2?


Setup a cpupool with that pcpu assigned to it and put your domain into
that cpupool.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: Null scheduler and vwfi native problem

2021-01-29 Thread Anders Törnqvist


On 1/26/21 11:31 PM, Dario Faggioli wrote:

On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:

On 1/25/21 5:11 PM, Dario Faggioli wrote:

On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:

Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above
"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started
correct.
"xl list" look like this for the domain:
mydomu   2   512 1 --
0.0

This is odd. I would have expected ``xl create`` to fail if
something
went wrong with the domain creation.


So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
   - with Julien's patch (the one he sent the other day, and that
you
     have already given a try to) applied
   - while you are in the state above, i.e., after having tried to
     destroy a domain and failing
   - and maybe again after having tried to start a new domain

Here are some logs.


Great, thanks a lot!


The system is booted as before with the patch and the domu config
does
not have the IRQs.


Ok.


# xl list
Name    ID   Mem VCPUs State
Time(s)
Domain-0 0  3000 5 r-
820.1
mydomu   1   511 1 r-
157.0

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)       1: [0.0] pcpu=0
(XEN)       2: [0.1] pcpu=1
(XEN)       3: [0.2] pcpu=2
(XEN)       4: [0.3] pcpu=3
(XEN)       5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)       6: [1.0] pcpu=5
(XEN) Waitqueue:


So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.


(XEN) Command line: console=dtuart dtuart=/serial@5a06
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
sched=null vwfi=native


Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.

The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.

So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).

Thanks for the feedback.
I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to 
the problem we're discussing. The system still behaves the same.


When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:

Name    ID  VCPU   CPU State Time(s) 
Affinity (Hard / Soft)

Domain-0 0 0    0   r--  29.4 all / all
Domain-0 0 1    1   r--  28.7 all / all
Domain-0 0 2    2   r--  28.7 all / all
Domain-0 0 3    3   r--  28.6 all / all
Domain-0 0 4    4   r--  28.6 all / all
mydomu      1 0    5   r--  21.6 5 / all

From this listing (with "all" as hard affinity for dom0) one might read 
it like dom0 is not pinned with hard affinity to any specific pCPUs at 
all but mudomu is pinned to pCPU 5.
Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run 
on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?


What if I would like mydomu to be th only domain that uses pCPU 2?




# xl destroy mydomu
(XEN) End of domain_destroy function

# xl list
Name    ID   Mem VCPUs State
Time(s)
Domain-0 0  3000 5 r-
1057.9

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=223871439875
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)       1: [0.0] pcpu=0
(XEN)       2: [0.1] pcpu=1
(XEN)       3: [0.2] pcpu=2
(XEN)       4: [0.3] pcpu=3
(XEN)       5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)       6: [1.0] pcpu=5


Right. And from the fact that: 1) we only see the "End of
domain_destroy function" line in the logs, and 2) we see that the vCPU
is still listed here, we have our confirmation (like there wase the
need for it :-/) that domain destruction is done only partially.

Yes it looks like that.



# xl create mydomu.cfg
Parsing config from mydomu.cfg
(XEN) Power on resource 215

# xl list
Name

Re: Null scheduler and vwfi native problem

2021-01-26 Thread Dario Faggioli

On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
> On 1/25/21 5:11 PM, Dario Faggioli wrote:
> > On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:
> > > Hi Anders,
> > > 
> > > On 22/01/2021 08:06, Anders Törnqvist wrote:
> > > > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > > > On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
> > > > - booting with "sched=null vwfi=native" but not doing the IRQ
> > > > passthrough that you mentioned above
> > > > "xl destroy" gives
> > > > (XEN) End of domain_destroy function
> > > > 
> > > > Then a "xl create" says nothing but the domain has not started
> > > > correct.
> > > > "xl list" look like this for the domain:
> > > > mydomu   2   512 1 --
> > > > 0.0
> > > This is odd. I would have expected ``xl create`` to fail if
> > > something
> > > went wrong with the domain creation.
> > > 
> > So, Anders, would it be possible to issue a:
> > 
> > # xl debug-keys r
> > # xl dmesg
> > 
> > And send it to us ?
> > 
> > Ideally, you'd do it:
> >   - with Julien's patch (the one he sent the other day, and that
> > you
> >     have already given a try to) applied
> >   - while you are in the state above, i.e., after having tried to
> >     destroy a domain and failing
> >   - and maybe again after having tried to start a new domain
> Here are some logs.
> 
Great, thanks a lot!

> The system is booted as before with the patch and the domu config
> does 
> not have the IRQs.
> 
Ok.

> # xl list
> Name    ID   Mem VCPUs State   
> Time(s)
> Domain-0 0  3000 5 r-
> 820.1
> mydomu   1   511 1 r-
> 157.0
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=191793008000
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
> (XEN) Waitqueue:
>
So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.

> (XEN) Command line: console=dtuart dtuart=/serial@5a06 
> dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin 
> sched=null vwfi=native
>
Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.

The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.

So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).

> # xl destroy mydomu
> (XEN) End of domain_destroy function
> 
> # xl list
> Name    ID   Mem VCPUs State   
> Time(s)
> Domain-0 0  3000 5 r-   
> 1057.9
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=223871439875
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
>
Right. And from the fact that: 1) we only see the "End of
domain_destroy function" line in the logs, and 2) we see that the vCPU
is still listed here, we have our confirmation (like there wase the
need for it :-/) that domain destruction is done only partially.

> # xl create mydomu.cfg
> Parsing config from mydomu.cfg
> (XEN) Power on resource 215
> 
> # xl list
> Name    ID   Mem VCPUs State   
> Time(s)
> Domain-0 0  3000 5 r-   
> 1152.1
> mydomu   2   512 1 --
>    0.0
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=241210530250
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
> (XEN)     Domain: 2
> (XEN)       7: [2.0] pcpu=-1
> (XEN) Waitqueue: d2v0
>
Yep, so, as we were suspecting, domain 1 was not destroyed properly.
Specifically, we did not get to the point where the vCPU is

Re: Null scheduler and vwfi native problem

2021-01-26 Thread Anders Törnqvist


On 1/25/21 5:11 PM, Dario Faggioli wrote:

On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:

Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above
"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started
correct.
"xl list" look like this for the domain:
mydomu   2   512 1 --
0.0

This is odd. I would have expected ``xl create`` to fail if something
went wrong with the domain creation.


So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
  - with Julien's patch (the one he sent the other day, and that you
have already given a try to) applied
  - while you are in the state above, i.e., after having tried to
destroy a domain and failing
  - and maybe again after having tried to start a new domain

Here are some logs.

The system is booted as before with the patch and the domu config does 
not have the IRQs.



# xl list
Name    ID   Mem VCPUs State    Time(s)
Domain-0 0  3000 5 r- 820.1
mydomu   1   511 1 r- 157.0

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)       1: [0.0] pcpu=0
(XEN)       2: [0.1] pcpu=1
(XEN)       3: [0.2] pcpu=2
(XEN)       4: [0.3] pcpu=3
(XEN)       5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)       6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0
(XEN)     run: [1.0] pcpu=5

# xl dmesg
(XEN) Checking for initrd in /chosen
(XEN) RAM: 8020 - 
(XEN) RAM: 00088000 - 0008
(XEN)
(XEN) MODULE[0]: 8040 - 8054d848 Xen
(XEN) MODULE[1]: 8300 - 83018000 Device Tree
(XEN) MODULE[2]: 8800 - 89701200 Kernel
(XEN)  RESVD[0]: 8800 - 9000
(XEN)  RESVD[1]: 8300 - 83018000
(XEN)  RESVD[2]: 8400 - 85ff
(XEN)  RESVD[3]: 8600 - 863f
(XEN)  RESVD[4]: 9000 - 903f
(XEN)  RESVD[5]: 9040 - 91ff
(XEN)  RESVD[6]: 9200 - 921f
(XEN)  RESVD[7]: 9220 - 923f
(XEN)  RESVD[8]: 9240 - 943f
(XEN)  RESVD[9]: 9440 - 94bf
(XEN)
(XEN) CMDLINE[8800]:chosen console=hvc0 earlycon=xen 
root=/dev/mmcblk0p3 mem=3000M hostname=myhost 
video=HDMI-A-1:1920x1080@60 imxdrm.legacyfb_depth=32   quiet loglevel=3 
logo.nologo vt.global_cursor_default=0

(XEN)
(XEN) Command line: console=dtuart dtuart=/serial@5a06 
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin 
sched=null vwfi=native

(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) partition id 4
(XEN) Domain name mydomu
(XEN) *Initialized MU
(XEN) Looking for dtuart at "/serial@5a06", options ""
 Xen 4.13.1-pre
(XEN) Xen version 4.13.1-pre (anders@builder.local) 
(aarch64-poky-linux-gcc (GCC) 8.3.0) debug=n  Fri Jan 22 17:32:33 UTC 2021

(XEN) Latest ChangeSet: Wed Feb 27 17:56:28 2019 +0800 git:b64b8df-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0100 
(XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN) Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg
(XEN)   Debug Features: 10305106 
(XEN)   Auxiliary Features:  
(XEN)   Memory Model Features: 1122 
(XEN)   ISA Features:  00011120 
(XEN) 32-bit Execution:
(XEN)   Processor Features: 0131:10011011
(XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN) Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 
(XEN)   Memory Model Features: 10201105 4000 0126 02102211
(XEN)  ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 8000 KHz

Re: Null scheduler and vwfi native problem

2021-01-25 Thread Dario Faggioli

On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:
> Hi Anders,
> 
> On 22/01/2021 08:06, Anders Törnqvist wrote:
> > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
> > - booting with "sched=null vwfi=native" but not doing the IRQ 
> > passthrough that you mentioned above
> > "xl destroy" gives
> > (XEN) End of domain_destroy function
> > 
> > Then a "xl create" says nothing but the domain has not started
> > correct. 
> > "xl list" look like this for the domain:
> > mydomu   2   512 1 --  
> > 0.0
> 
> This is odd. I would have expected ``xl create`` to fail if something
> went wrong with the domain creation.
>
So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
 - with Julien's patch (the one he sent the other day, and that you 
   have already given a try to) applied
 - while you are in the state above, i.e., after having tried to 
   destroy a domain and failing
 - and maybe again after having tried to start a new domain

> One possibility is the NULL scheduler doesn't release the pCPUs until
> the domain is fully destroyed. So if there is no pCPU free, it
> wouldn't 
> be able to schedule the new domain.
> 
> However, I would have expected the NULL scheduler to refuse the
> domain 
> to create if there is no pCPU available.
> 
Yeah but, unfortunately, the scheduler does not have it easy to fail
domain creation at this stage (i.e., when we realize there are no
available pCPUs). That's the reason why the NULL scheduler has a
waitqueue, where vCPUs that cannot be put on any pCPU are put.

Of course, this is a configuration error (or a bug, like maybe in this
case :-/), and we print warnings when it happens.

> @Dario, @Stefano, do you know when the NULL scheduler decides to 
> allocate the pCPU?
> 
On which pCPU to allocate a vCPU is decided in null_unit_insert(),
called from sched_alloc_unit() and sched_init_vcpu().

On the other hand, a vCPU is properly removed from its pCPU, hence
making the pCPU free for being assigned to some other vCPU, in
unit_deassign(), called from null_unit_remove(), which in turn is
called from sched_destroy_vcpu() Which is indeed called from
complete_domain_destroy().

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-01-25 Thread Dario Faggioli

On Fri, 2021-01-22 at 18:44 +0100, Anders Törnqvist wrote:
> Listing vcpus looks like this when the domain is running:
> 
> xl vcpu-list
> Name    ID  VCPU   CPU State   Time(s) 
> Affinity (Hard / Soft)
> Domain-0 0 0    0   r-- 101.7 0 /
> all
> Domain-0 0 1    1   r-- 101.0 1 /
> all
> Domain-0 0 2    2   r-- 101.0 2 /
> all
> Domain-0 0 3    3   r-- 100.9 3 /
> all
> Domain-0 0 4    4   r-- 100.9 4 /
> all
> mydomu      1 0    5   r--  89.5 5 /
> all
> 
> vCPU nr 0 is also for dom0. Is that normal?
> 
Yeah, that's the vCPU IDs numbering. Each VM/guest (including dom0) has
its vCPUs and they have ID starting from 0.

What counts here, to make sure that the NULL scheduler "configuration"
is correct, is that each VCPU is associated to one and only one PCPU.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Anders Törnqvist


On 1/22/21 3:26 PM, Julien Grall wrote:

Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
- booting with "sched=null vwfi=native" but not doing the IRQ 
passthrough that you mentioned above

"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started 
correct. "xl list" look like this for the domain:

mydomu   2   512 1 --   0.0


This is odd. I would have expected ``xl create`` to fail if something 
went wrong with the domain creation.


The list of dash, suggests that the domain is:
   - Not running
   - Not blocked (i.e cannot run)
   - Not paused
   - Not shutdown

So this suggest the NULL scheduler didn't schedule the vCPU. Would it 
be possible to describe your setup:

  - How many pCPUs?

There are 6 pCPUs

  - How many vCPUs did you give to dom0?

I gave it 5

  - What was the number of the vCPUs given to the previous guest?


Nr 0.

Listing vcpus looks like this when the domain is running:

xl vcpu-list
Name    ID  VCPU   CPU State   Time(s) 
Affinity (Hard / Soft)

Domain-0 0 0    0   r-- 101.7 0 / all
Domain-0 0 1    1   r-- 101.0 1 / all
Domain-0 0 2    2   r-- 101.0 2 / all
Domain-0 0 3    3   r-- 100.9 3 / all
Domain-0 0 4    4   r-- 100.9 4 / all
mydomu      1 0    5   r--  89.5 5 / all

vCPU nr 0 is also for dom0. Is that normal?



One possibility is the NULL scheduler doesn't release the pCPUs until 
the domain is fully destroyed. So if there is no pCPU free, it 
wouldn't be able to schedule the new domain.


However, I would have expected the NULL scheduler to refuse the domain 
to create if there is no pCPU available.


@Dario, @Stefano, do you know when the NULL scheduler decides to 
allocate the pCPU?


Cheers,

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Anders Törnqvist


On 1/22/21 3:02 PM, Julien Grall wrote:

Hi Dario,

On 21/01/2021 23:35, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

Hi Dario,


Hi!


On 21/01/2021 18:32, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:

https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?


PCI passthrough is not yet supported on Arm :). However, the bug was
reported with platform device passthrough.


Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)


Well, I'll think about it. >

Starting the system without "sched=null vwfi=native" does not
result
in
the problem.


Ok, how about, if you're up for some more testing:

   - booting with "sched=null" but not with "vwfi=native"
   - booting with "sched=null vwfi=native" but not doing the IRQ
 passthrough that you mentioned above

?


I think we can skip the testing as the bug was fully diagnostics back
then. Unfortunately, I don't think a patch was ever posted.


True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.


I wonder if this is because your patch doesn't modify rsinterval. So 
even if we call force_quiescent_state(), the softirq would only be 
raised for the current CPU.


I guess the following HACK could confirm the theory:

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3def0..50020bc34ddf 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data 
*rdp,

 {
 cpumask_t cpumask;
 raise_softirq(RCU_SOFTIRQ);
-    if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
+    if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
 rdp->last_rs_qlen = rdp->qlen;
 /*
  * Don't send IPI to itself. With irqs disabled,

Cheers,

I applied the patch above. No change. The function 
complete_domain_destroy function is not call when I destroy the domain.


/Anders

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Julien Grall


Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
- booting with "sched=null vwfi=native" but not doing the IRQ 
passthrough that you mentioned above

"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started correct. 
"xl list" look like this for the domain:

mydomu   2   512 1 --   0.0


This is odd. I would have expected ``xl create`` to fail if something 
went wrong with the domain creation.


The list of dash, suggests that the domain is:
   - Not running
   - Not blocked (i.e cannot run)
   - Not paused
   - Not shutdown

So this suggest the NULL scheduler didn't schedule the vCPU. Would it be 
possible to describe your setup:

  - How many pCPUs?
  - How many vCPUs did you give to dom0?
  - What was the number of the vCPUs given to the previous guest?

One possibility is the NULL scheduler doesn't release the pCPUs until 
the domain is fully destroyed. So if there is no pCPU free, it wouldn't 
be able to schedule the new domain.


However, I would have expected the NULL scheduler to refuse the domain 
to create if there is no pCPU available.


@Dario, @Stefano, do you know when the NULL scheduler decides to 
allocate the pCPU?


Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Julien Grall


Hi Dario,

On 21/01/2021 23:35, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

Hi Dario,


Hi!


On 21/01/2021 18:32, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
   
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html

.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?


PCI passthrough is not yet supported on Arm :). However, the bug was
reported with platform device passthrough.


Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)


Well, I'll think about it. >

Starting the system without "sched=null vwfi=native" does not
result
in
the problem.


Ok, how about, if you're up for some more testing:

   - booting with "sched=null" but not with "vwfi=native"
   - booting with "sched=null vwfi=native" but not doing the IRQ
     passthrough that you mentioned above

?


I think we can skip the testing as the bug was fully diagnostics back
then. Unfortunately, I don't think a patch was ever posted.


True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.


I wonder if this is because your patch doesn't modify rsinterval. So 
even if we call force_quiescent_state(), the softirq would only be 
raised for the current CPU.


I guess the following HACK could confirm the theory:

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3def0..50020bc34ddf 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data *rdp,
 {
 cpumask_t cpumask;
 raise_softirq(RCU_SOFTIRQ);
-if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
+if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
 rdp->last_rs_qlen = rdp->qlen;
 /*
  * Don't send IPI to itself. With irqs disabled,

Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Dario Faggioli

On Fri, 2021-01-22 at 09:06 +0100, Anders Törnqvist wrote:
> On 1/22/21 12:35 AM, Dario Faggioli wrote:
> 
> 
> - booting with "sched=null" but not with "vwfi=native"
> Without "vwfi=native" it works fine to destroy and to re-create the
> domain.
> Both printouts comes after a destroy:
> (XEN) End of domain_destroy function
> (XEN) End of complete_domain_destroy function
> 
Ok, thanks for doing these tests.

The fact that not using "vwfi=native" makes things work, seem to point
in the direction that myself and Julien (and you as well!) were
suspecting. I.e., it is the same issue than the one in the old xen-
devel thread.

I'm still a but puzzled why the debug patch posted back then does not
work for you... but that's not really super important. Let's try to
come up with a new debug patch and, this time, a proper fix. :-)

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)

signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Anders Törnqvist


On 1/21/21 7:32 PM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:

Hi,


Hello,


I see a problem with destroy and restart of a domain. Interrupts are
not
available when trying to restart a domain.

The situation seems very similar to the thread "null scheduler bug"
  
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html

.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?


The target system is a iMX8-based ARM board and Xen is a 4.13.0
version
built from https://source.codeaurora.org/external/imx/imx-xen.git.


Mmm, perhaps it's me, but neither going at that url with a browser not
trying to clone it, I do not see anything. What I'm doing wrong?

Sorry. The link is https://source.codeaurora.org/external/imx/imx-xen.



Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.


Ok, I guess it is involved, since you say "some interrupts are passed
through..."


When destroying the domain with xl destroy etc it does not complain
but
then when trying to restart the domain
again with a "xl create " I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the
domain without complaining about the IRQ.

Inspired from the discussion in the thread above I have put printks
in
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of
complete_domain_destroy function\n") in the beginning.

With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.


Ok, thanks for making these tests. It's helpful to have this
information right away.


"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I
also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from
 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html

.
This does seem to change the results.


Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it.


Starting the system without "sched=null vwfi=native" does not result
in
the problem.


Ok, how about, if you're up for some more testing:

  - booting with "sched=null" but not with "vwfi=native"
  - booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above

?

Regards

Re: Null scheduler and vwfi native problem

2021-01-22 Thread Anders Törnqvist


Thanks for the responses.

On 1/22/21 12:35 AM, Dario Faggioli wrote:

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:

Hi Dario,


Hi!


On 21/01/2021 18:32, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
   
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html

.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?

PCI passthrough is not yet supported on Arm :). However, the bug was
reported with platform device passthrough.


Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)


Well, I'll think about it. >

Starting the system without "sched=null vwfi=native" does not
result
in
the problem.


Ok, how about, if you're up for some more testing:

   - booting with "sched=null" but not with "vwfi=native"
   - booting with "sched=null vwfi=native" but not doing the IRQ
     passthrough that you mentioned above

?

I think we can skip the testing as the bug was fully diagnostics back
then. Unfortunately, I don't think a patch was ever posted.


True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.

I made the tests as suggested to shed some more light if needed.

- booting with "sched=null" but not with "vwfi=native"
Without "vwfi=native" it works fine to destroy and to re-create the domain.
Both printouts comes after a destroy:
(XEN) End of domain_destroy function
(XEN) End of complete_domain_destroy function


- booting with "sched=null vwfi=native" but not doing the IRQ 
passthrough that you mentioned above

"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started correct. 
"xl list" look like this for the domain:

mydomu   2   512 1 --   0.0



Anyway, it's true that we left the issue pending, so something like
this:


  From Xen PoV, any pCPU executing guest context can be considered
quiescent. So one way to solve the problem would be to mark the pCPU
when entering to the guest.


Should be done anyway.

We'll then see if it actually solves this problem too, or if this is
really something else.

Thanks for the summary, BTW. :-)

I'll try to work on a patch.

Thanks, just let me know if I can do some testing to assist.


Regards


[1]
 
https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Dario Faggioli

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
> Hi Dario,
> 
Hi!

> On 21/01/2021 18:32, Dario Faggioli wrote:
> > On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> > >   
> > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> > > .
> > > 
> > Right. Back then, PCI passthrough was involved, if I remember
> > correctly. Is it the case for you as well?
> 
> PCI passthrough is not yet supported on Arm :). However, the bug was 
> reported with platform device passthrough.
> 
Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)

> > Well, I'll think about it. >
> > > Starting the system without "sched=null vwfi=native" does not
> > > result
> > > in
> > > the problem.
> > > 
> > Ok, how about, if you're up for some more testing:
> > 
> >   - booting with "sched=null" but not with "vwfi=native"
> >   - booting with "sched=null vwfi=native" but not doing the IRQ
> >     passthrough that you mentioned above
> > 
> > ?
> 
> I think we can skip the testing as the bug was fully diagnostics back
> then. Unfortunately, I don't think a patch was ever posted.
>
True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.

Anyway, it's true that we left the issue pending, so something like
this:

>  From Xen PoV, any pCPU executing guest context can be considered 
> quiescent. So one way to solve the problem would be to mark the pCPU 
> when entering to the guest.
> 
Should be done anyway.

We'll then see if it actually solves this problem too, or if this is
really something else.

Thanks for the summary, BTW. :-)

I'll try to work on a patch.

Regards

> [1] 
> 
> https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Julien Grall


Hi Dario,

On 21/01/2021 18:32, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:

Hi,
I see a problem with destroy and restart of a domain. Interrupts are
not
available when trying to restart a domain.

The situation seems very similar to the thread "null scheduler bug"
  
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html

.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?


PCI passthrough is not yet supported on Arm :). However, the bug was 
reported with platform device passthrough.


[...]


"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I
also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from
 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html

.
This does seem to change the results.


Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it. >

Starting the system without "sched=null vwfi=native" does not result
in
the problem.


Ok, how about, if you're up for some more testing:

  - booting with "sched=null" but not with "vwfi=native"
  - booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above

?


I think we can skip the testing as the bug was fully diagnostics back 
then. Unfortunately, I don't think a patch was ever posted. The 
interesting bits start at [1]. Let me try to summarize here.


This has nothing to do with device passthrough, but the bug is easier to 
spot as interrupts are only going to be released when then domain is 
fully destroyed (we should really release them during the relinquish 
period...).


The last step of the domain destruction (complete_domain_destroy()) will 
*only* happen when all the CPUs are considered quiescent from the RCU PoV.


As you pointed out on that thread, the RCU implementation in Xen 
requires the pCPU to enter in the hypervisor (via hypercalls, 
interrupts...) time to time.


This assumption doesn't hold anymore when using "sched=null vwfi=native" 
because a vCPU will not exit when it is idling (vwfi=native) and there 
may not be any other source of interrupt on that vCPU.


Therefore the quiescent state will never be reached on the pCPU running 
that vCPU.


From Xen PoV, any pCPU executing guest context can be considered 
quiescent. So one way to solve the problem would be to mark the pCPU 
when entering to the guest.


Cheers,

[1] 
https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/


--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Julien Grall





On 21/01/2021 10:54, Anders Törnqvist wrote:

Hi,


Hi Anders,

Thank you for reporting the bug. I am adding Stefano and Dario as IIRC 
they were going to work on a solution.


Cheers,

I see a problem with destroy and restart of a domain. Interrupts are not 
available when trying to restart a domain.


The situation seems very similar to the thread "null scheduler bug" 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.


The target system is a iMX8-based ARM board and Xen is a 4.13.0 version 
built from https://source.codeaurora.org/external/imx/imx-xen.git.


Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but 
then when trying to restart the domain

again with a "xl create " I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the 
domain without complaining about the IRQ.


Inspired from the discussion in the thread above I have put printks in 
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy 
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of 
complete_domain_destroy function\n") in the beginning.


With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also 
get:

(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. 
This does seem to change the results.


Starting the system without "sched=null vwfi=native" does not result in 
the problem.


BR
Anders





--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Dario Faggioli

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> Hi,
> 
Hello,

> I see a problem with destroy and restart of a domain. Interrupts are
> not 
> available when trying to restart a domain.
> 
> The situation seems very similar to the thread "null scheduler bug" 
>  
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> .
> 
Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?

> The target system is a iMX8-based ARM board and Xen is a 4.13.0
> version 
> built from https://source.codeaurora.org/external/imx/imx-xen.git.
> 
Mmm, perhaps it's me, but neither going at that url with a browser not
trying to clone it, I do not see anything. What I'm doing wrong?

> Xen is booted with sched=null vwfi=native.
> One physical CPU core is pinned to the domu.
> Some interrupts are passed through to the domu.
> 
Ok, I guess it is involved, since you say "some interrupts are passed
through..."

> When destroying the domain with xl destroy etc it does not complain
> but 
> then when trying to restart the domain
> again with a "xl create " I get:
> (XEN) IRQ 210 is already used by domain 1
> 
> "xl list" does not contain the domain.
> 
> Repeating the "xl create" command 5-10 times eventually starts the 
> domain without complaining about the IRQ.
> 
> Inspired from the discussion in the thread above I have put printks
> in 
> the xen/common/domain.c file.
> In the function domain_destroy I have a printk("End of domain_destroy
> function\n") in the end.
> In the function complete_domain_destroy have a printk("Begin of 
> complete_domain_destroy function\n") in the beginning.
> 
> With these printouts I get at "xl destroy":
> (XEN) End of domain_destroy function
> 
> So it seems like the function complete_domain_destroy is not called.
> 
Ok, thanks for making these tests. It's helpful to have this
information right away.

> "xl create" results in:
> (XEN) IRQ 210 is already used by domain 1
> (XEN) End of domain_destroy function
> 
> Then repeated "xl create" looks the same until after a few tries I
> also get:
> (XEN) Begin of complete_domain_destroy function
> 
> After that the next "xl create" creates the domain.
> 
> 
> I have also applied the patch from 
> 
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
> . 
> This does seem to change the results.
> 
Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it.

> Starting the system without "sched=null vwfi=native" does not result
> in 
> the problem.
>
Ok, how about, if you're up for some more testing:

 - booting with "sched=null" but not with "vwfi=native"
 - booting with "sched=null vwfi=native" but not doing the IRQ 
   passthrough that you mentioned above

?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Null scheduler and vwfi native problem

2021-01-21 Thread Anders Törnqvist


Hi,

I see a problem with destroy and restart of a domain. Interrupts are not 
available when trying to restart a domain.


The situation seems very similar to the thread "null scheduler bug" 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.


The target system is a iMX8-based ARM board and Xen is a 4.13.0 version 
built from https://source.codeaurora.org/external/imx/imx-xen.git.


Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but 
then when trying to restart the domain

again with a "xl create " I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the 
domain without complaining about the IRQ.


Inspired from the discussion in the thread above I have put printks in 
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy 
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of 
complete_domain_destroy function\n") in the beginning.


With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. 
This does seem to change the results.


Starting the system without "sched=null vwfi=native" does not result in 
the problem.


BR
Anders

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Re: Null scheduler and vwfi native problem

Null scheduler and vwfi native problem

31 matches

Site Navigation

Mail list logo

Footer information