Re: Null scheduler and vwfi native problem
On 1/30/21 6:59 PM, Dario Faggioli wrote: On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote: On 1/26/21 11:31 PM, Dario Faggioli wrote: Thanks again for letting us see these logs. Thanks for the attention to this :-) Any ideas for how to solve it? So, you're up for testing patches, right? How about applying these two, and letting me know what happens? :-D Great work guys! Hi. Now I got the time to test the patches. They was not possible to apply without fail on the code version I am using which is commit b64b8df622963accf85b227e468fe12b2d56c128 from https://source.codeaurora.org/external/imx/imx-xen. I did some editing to get them into my code. I think I should have removed some sched_tick_suspend/sched_tick_resume calls also. See the attached patches for what I have applied on the code. Anyway, after applying the patches including the original rcu-quiesc-patch.patch the destroy of the domu seems to work. I have rebooted, only destroyed-created and used Xen watchdog to reboot the domu in total about 20 times and so far it has nicely destroyed and the been able to start a new instance of the domu. So it looks promising although my edited patches probably need some fixing. They are on top of current staging. I can try to rebase on something else, if it's easier for you to test. Besides being attached, they're also available here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix I could not test them properly on ARM, as I don't have an ARM system handy, so everything is possible really... just let me know. It should at least build fine, AFAICT from here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213 Julien, back in: https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/ you said I should hook in enter_hypervisor_head(), leave_hypervisor_tail(). Those functions are gone now and looking at how the code changed, this is where I figured I should put the calls (see the second patch). But feel free to educate me otherwise. For x86 people that are listening... Do we have, in our beloved arch, equally handy places (i.e., right before leaving Xen for a guest and right after entering Xen from one), preferrably in a C file, and for all guests... like it seems to be the case on ARM? Regards diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index d6dc4b48db..42ab9dbbd6 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -52,8 +52,8 @@ static struct rcu_ctrlblk { int next_pending; /* Is the next batch already waiting? */ spinlock_t lock __cacheline_aligned; -cpumask_t cpumask; /* CPUs that need to switch in order ... */ -cpumask_t idle_cpumask; /* ... unless they are already idle */ +cpumask_t cpumask; /* CPUs that need to switch in order ... */ +cpumask_t ignore_cpumask; /* ... unless they are already idle */ /* for current batch to proceed.*/ } __cacheline_aligned rcu_ctrlblk = { .cur = -300, @@ -86,8 +86,8 @@ struct rcu_data { longlast_rs_qlen; /* qlen during the last resched */ /* 3) idle CPUs handling */ -struct timer idle_timer; -bool idle_timer_active; +struct timer cb_timer; +bool cb_timer_active; }; /* @@ -116,22 +116,22 @@ struct rcu_data { * CPU that is going idle. The user can change this, via a boot time * parameter, but only up to 100ms. */ -#define IDLE_TIMER_PERIOD_MAX MILLISECS(100) -#define IDLE_TIMER_PERIOD_DEFAULT MILLISECS(10) -#define IDLE_TIMER_PERIOD_MIN MICROSECS(100) +#define CB_TIMER_PERIOD_MAX MILLISECS(100) +#define CB_TIMER_PERIOD_DEFAULT MILLISECS(10) +#define CB_TIMER_PERIOD_MIN MICROSECS(100) -static s_time_t __read_mostly idle_timer_period; +static s_time_t __read_mostly cb_timer_period; /* - * Increment and decrement values for the idle timer handler. The algorithm + * Increment and decrement values for the callback timer handler. The algorithm * works as follows: * - if the timer actually fires, and it finds out that the grace period isn't - * over yet, we add IDLE_TIMER_PERIOD_INCR to the timer's period; + * over yet, we add CB_TIMER_PERIOD_INCR to the timer's period; * - if the timer actually fires and it finds the grace period over, we * subtract IDLE_TIMER_PERIOD_DECR from the timer's period. */ -#define IDLE_TIMER_PERIOD_INCRMILLISECS(10) -#define IDLE_TIMER_PERIOD_DECRMICROSECS(100) +#define CB_TIMER_PERIOD_INCRMILLISECS(10) +#define CB_TIMER_PERIOD_DECRMICROSECS(100) static DEFINE_PER_CPU(struct rcu_data, rcu_data); @@ -309,7 +309,7 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp) * This barrier is paired with the one in rcu_idle_enter(). */ smp_mb(); -cpumask_andnot(>cpumask, _online_map, >idle_cpumask); +cpumask_andnot(>cpumask, _online_map, >ignore_cpumask); } } @@ -455,7 +455,7 @@ int
Re: Null scheduler and vwfi native problem
On 03.02.21 12:20, Julien Grall wrote: Hi Juergen, On 03/02/2021 11:00, Jürgen Groß wrote: On 03.02.21 10:19, Julien Grall wrote: Hi, On 03/02/2021 07:31, Dario Faggioli wrote: On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote: In reality, it is probably still too early as a pCPU can be considered quiesced until a call to rcu_lock*() (such rcu_lock_domain()). Well, yes, in theory, we could track down which is the first RCU read side crit. section on this path, and put the call right before that (if I understood what you mean). Oh, that's not what I meant. This will indeed be far more complex than I originally had in mind. AFAIU, the RCU uses critical section to protect data. So the "entering" could be used as "the pCPU is not quiesced" and "exiting" could be used as "the pCPU is quiesced". The concern with my approach is we would need to make sure that Xen correctly uses the rcu helpers. I know Juergen worked on that recently, but I don't know whether this is fully complete. I think it is complete, but I can't be sure, of course. One bit missing (for catching some wrong uses of the helpers) is this patch: https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html I don't remember why it hasn't been taken, but I think there was a specific reason for that. Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit puzzled to why this wasn't committed... I had to go to v6 and notice the following message: "albeit to be honest I'm not fully convinced we need to go this far." Was the implication that his reviewed-by was conditional to someone else answering to the e-mail? I have no record for that being the case. Patches 1-3 of that series were needed for getting rid of stop_machine_run() in rcu handling and to fix other problems. Patch 4 was adding some additional ASSERT()s for making sure no potential deadlocks due to wrong rcu usage could creep in again. Patch 5 was more a "nice to have" addition in order to avoid any wrong usage of rcu which should have no real negative impact on the system stability. So I believe Jan as the committer didn't want to commit it himself, but was fine with the overall idea and implementation. I still think for code sanity it would be nice, but I was rather busy with Xenstore and event channel security work at that time, so I didn't urge anyone to take this patch. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: Null scheduler and vwfi native problem
Hi Juergen, On 03/02/2021 11:00, Jürgen Groß wrote: On 03.02.21 10:19, Julien Grall wrote: Hi, On 03/02/2021 07:31, Dario Faggioli wrote: On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote: In reality, it is probably still too early as a pCPU can be considered quiesced until a call to rcu_lock*() (such rcu_lock_domain()). Well, yes, in theory, we could track down which is the first RCU read side crit. section on this path, and put the call right before that (if I understood what you mean). Oh, that's not what I meant. This will indeed be far more complex than I originally had in mind. AFAIU, the RCU uses critical section to protect data. So the "entering" could be used as "the pCPU is not quiesced" and "exiting" could be used as "the pCPU is quiesced". The concern with my approach is we would need to make sure that Xen correctly uses the rcu helpers. I know Juergen worked on that recently, but I don't know whether this is fully complete. I think it is complete, but I can't be sure, of course. One bit missing (for catching some wrong uses of the helpers) is this patch: https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html I don't remember why it hasn't been taken, but I think there was a specific reason for that. Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit puzzled to why this wasn't committed... I had to go to v6 and notice the following message: "albeit to be honest I'm not fully convinced we need to go this far." Was the implication that his reviewed-by was conditional to someone else answering to the e-mail? Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
On 03.02.21 10:19, Julien Grall wrote: Hi, On 03/02/2021 07:31, Dario Faggioli wrote: On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote: In reality, it is probably still too early as a pCPU can be considered quiesced until a call to rcu_lock*() (such rcu_lock_domain()). Well, yes, in theory, we could track down which is the first RCU read side crit. section on this path, and put the call right before that (if I understood what you mean). Oh, that's not what I meant. This will indeed be far more complex than I originally had in mind. AFAIU, the RCU uses critical section to protect data. So the "entering" could be used as "the pCPU is not quiesced" and "exiting" could be used as "the pCPU is quiesced". The concern with my approach is we would need to make sure that Xen correctly uses the rcu helpers. I know Juergen worked on that recently, but I don't know whether this is fully complete. I think it is complete, but I can't be sure, of course. One bit missing (for catching some wrong uses of the helpers) is this patch: https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html I don't remember why it hasn't been taken, but I think there was a specific reason for that. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: Null scheduler and vwfi native problem
Hi, On 03/02/2021 07:31, Dario Faggioli wrote: On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote: In reality, it is probably still too early as a pCPU can be considered quiesced until a call to rcu_lock*() (such rcu_lock_domain()). Well, yes, in theory, we could track down which is the first RCU read side crit. section on this path, and put the call right before that (if I understood what you mean). Oh, that's not what I meant. This will indeed be far more complex than I originally had in mind. AFAIU, the RCU uses critical section to protect data. So the "entering" could be used as "the pCPU is not quiesced" and "exiting" could be used as "the pCPU is quiesced". The concern with my approach is we would need to make sure that Xen correctly uses the rcu helpers. I know Juergen worked on that recently, but I don't know whether this is fully complete. Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
Hi again, On Tue, 2021-02-02 at 15:23 +, Julien Grall wrote: > (Adding Andrew, Jan, Juergen for visibility) > Thanks! :-) > On 02/02/2021 15:03, Dario Faggioli wrote: > > On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote: > > > The placement in enter_hypervisor_from_guest() doesn't matter too > > > much, > > > although I would consider to call it as a late as possible. > > > > > Mmmm... Can I ask why? In fact, I would have said "as soon as > > possible". > > Because those functions only access data for the current vCPU/domain. > This is already protected by the fact that the domain is running. > Mmm.. ok, yes, I think it makes sense. > By leaving the "quiesce" mode later, you give an opportunity to the > RCU > to release memory earlier. > Yeah. What I wanted to be sure is that we put the CPU "back in the race" :-) before any current or future use of RCUs. > In reality, it is probably still too early as a pCPU can be > considered > quiesced until a call to rcu_lock*() (such rcu_lock_domain()). > Well, yes, in theory, we could track down which is the first RCU read side crit. section on this path, and put the call right before that (if I understood what you mean). To me, however, this looks indeed too complex and difficult to maintain, not only for 4.15 but in general. E.g., suppose we find such a use of RCUs in function foo() called by bar() called by hypervisor_enter_from_guest(). If someone at some points wants to use RCUs in bar(), how does she know that she should also move the call to rcu_quiet_enter() from foo() to there? So, yes, I'll move it a little down, but still within hypervisor_enter_from_guest(). In the meanwhile, I had a quick chat with Jourgen about x86. In fact, I had a look and was not finding a place where to put the rcu_quiet_{exit,enter}() calls as convenient as you have here on ARM. I.e., two nice C functions that we traverse for all kind of guests, for HVM and SVM, etc. Actually, I was quite skeptical about it but, you know, one can hope! Juergen confirmed that there's no such things, so I'll look at the various entry.S files for the proper spots. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
(Adding Andrew, Jan, Juergen for visibility) Hi Dario, On 02/02/2021 15:03, Dario Faggioli wrote: On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote: Hi Dario, I have had a quick look at your place. The RCU call in leave_hypervisor_to_guest() needs to be placed just after the last call to check_for_pcpu_work(). Otherwise, you may be preempted and keep the RCU quiet. Ok, makes sense. I'll move it. The placement in enter_hypervisor_from_guest() doesn't matter too much, although I would consider to call it as a late as possible. Mmmm... Can I ask why? In fact, I would have said "as soon as possible". Because those functions only access data for the current vCPU/domain. This is already protected by the fact that the domain is running. By leaving the "quiesce" mode later, you give an opportunity to the RCU to release memory earlier. In reality, it is probably still too early as a pCPU can be considered quiesced until a call to rcu_lock*() (such rcu_lock_domain()). But this would require some investigation to check if we effectively protect all the region with the RCU helpers. This is likely too complicated for 4.15. Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
On Tue, 2021-02-02 at 07:59 +, Julien Grall wrote: > Hi Dario, > Hi! > I have had a quick look at your place. The RCU call in > leave_hypervisor_to_guest() needs to be placed just after the last > call > to check_for_pcpu_work(). > > Otherwise, you may be preempted and keep the RCU quiet. > Ok, makes sense. I'll move it. > The placement in enter_hypervisor_from_guest() doesn't matter too > much, > although I would consider to call it as a late as possible. > Mmmm... Can I ask why? In fact, I would have said "as soon as possible". Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
Hi Dario, On 30/01/2021 17:59, Dario Faggioli wrote: On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote: On 1/26/21 11:31 PM, Dario Faggioli wrote: Thanks again for letting us see these logs. Thanks for the attention to this :-) Any ideas for how to solve it? So, you're up for testing patches, right? How about applying these two, and letting me know what happens? :-D They are on top of current staging. I can try to rebase on something else, if it's easier for you to test. Besides being attached, they're also available here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix I could not test them properly on ARM, as I don't have an ARM system handy, so everything is possible really... just let me know. It should at least build fine, AFAICT from here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213 Julien, back in: https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/ you said I should hook in enter_hypervisor_head(), leave_hypervisor_tail(). Those functions are gone now and looking at how the code changed, this is where I figured I should put the calls (see the second patch). But feel free to educate me otherwise. enter_hypervisor_from_guest() and leave_hypervisor_to_guest() are the new functions. I have had a quick look at your place. The RCU call in leave_hypervisor_to_guest() needs to be placed just after the last call to check_for_pcpu_work(). Otherwise, you may be preempted and keep the RCU quiet. The placement in enter_hypervisor_from_guest() doesn't matter too much, although I would consider to call it as a late as possible. Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
On 1/30/21 6:59 PM, Dario Faggioli wrote: On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote: On 1/26/21 11:31 PM, Dario Faggioli wrote: Thanks again for letting us see these logs. Thanks for the attention to this :-) Any ideas for how to solve it? So, you're up for testing patches, right? Absolutely. I will apply them and be back with the results. :-) How about applying these two, and letting me know what happens? :-D They are on top of current staging. I can try to rebase on something else, if it's easier for you to test. Besides being attached, they're also available here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix I could not test them properly on ARM, as I don't have an ARM system handy, so everything is possible really... just let me know. It should at least build fine, AFAICT from here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213 Julien, back in: https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/ you said I should hook in enter_hypervisor_head(), leave_hypervisor_tail(). Those functions are gone now and looking at how the code changed, this is where I figured I should put the calls (see the second patch). But feel free to educate me otherwise. For x86 people that are listening... Do we have, in our beloved arch, equally handy places (i.e., right before leaving Xen for a guest and right after entering Xen from one), preferrably in a C file, and for all guests... like it seems to be the case on ARM? Regards
Re: Null scheduler and vwfi native problem
On 1/29/21 11:16 AM, Dario Faggioli wrote: On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote: On 29.01.21 09:08, Anders Törnqvist wrote: So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). Thanks for the feedback. I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to the problem we're discussing. Right. Don't put it back, and stay away from it, if you accept an advice. :-) The system still behaves the same. Yeah, that was expected. When the dom0_vcpus_pin is removed. xl vcpu-list looks like this: Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 r-- 29.4 all / all Domain-0 0 1 1 r-- 28.7 all / all Domain-0 0 2 2 r-- 28.7 all / all Domain-0 0 3 3 r-- 28.6 all / all Domain-0 0 4 4 r-- 28.6 all / all mydomu 1 0 5 r-- 21.6 5 / all Right, and it makes sense for it to look like this. From this listing (with "all" as hard affinity for dom0) one might read it like dom0 is not pinned with hard affinity to any specific pCPUs at all but mudomu is pinned to pCPU 5. Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only? No. Well, yes... if you use the NULL scheduler. Which is in use here. :-) Basically, the NULL scheduler _always_ assign one and only one vCPU to each pCPU. This happens at domain (well, at the vCPU) creation time. And it _never_ move a vCPU away from the pCPU to which it has assigned it. And it also _never_ change this vCPU-->pCPU assignment/relationship, unless some special event happens (such as, either the vCPU and/or the pCPU goes offline, is removed from the cpupool, you change the affinity [as I'll explain below], etc). This is the NULL scheduler's mission and only job, so it does that by default, _without_ any need for an affinity to be specified. So, how can affinity be useful in the NULL scheduler? Well, it's useful if you want to control and decide to what pCPU a certain vCPU should go. So, let's make an example. Let's say you are in this situation: NameID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 00 r-- 29.4 all / all Domain-0 0 11 r-- 28.7 all / all Domain-0 0 22 r-- 28.7 all / all Domain-0 0 33 r-- 28.6 all / all Domain-0 0 44 r-- 28.6 all / all I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are not using dom0_vcpus_pin. The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU that can run on pCPU 0, despite its affinities being "all"... because it's what the NULL scheduler does for you and it's the reason why one uses it! :-) Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3 and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as well. With no affinity being specified, which vCPU is assigned to which pCPU is entirely under the NULL scheduler control. It has its heuristics inside, to try to do that in a smart way, but that's an internal/implementation detail and is not relevant here. If you now create a domU with 1 vCPU, that vCPU will be assigned to pCPU 5. Now, let's say that, for whatever reason, you absolutely want that d0v2 to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is what the NULL scheduler decided to pick for it). Well, what you do is use xl, set the affinity of d0v2 to pCPU 5, and you will get something like this as a result: NameID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 00 r-- 29.4 all / all Domain-0 0 11 r-- 28.7 all / all Domain-0 0 25 r-- 28.7 5 / all Domain-0 0 33 r-- 28.6 all / all Domain-0 0 44 r-- 28.6 all / all So, affinity is indeed useful, even when using NULL, if you want to diverge from the default behavior and enact a certain policy, maybe due to the nature of your workload, the characteristics of your hardware, or whatever. It is not, however, necessary to set the affinity to: - have a vCPU to always stay on one --and always the same one too-- pCPU; - avoid that any other vCPU would ever run on that pCPU. That is guaranteed by the NULL scheduler itself. It just
Re: Null scheduler and vwfi native problem
On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote: > On 1/26/21 11:31 PM, Dario Faggioli wrote: > > Thanks again for letting us see these logs. > > Thanks for the attention to this :-) > > Any ideas for how to solve it? > So, you're up for testing patches, right? How about applying these two, and letting me know what happens? :-D They are on top of current staging. I can try to rebase on something else, if it's easier for you to test. Besides being attached, they're also available here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix I could not test them properly on ARM, as I don't have an ARM system handy, so everything is possible really... just let me know. It should at least build fine, AFAICT from here: https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213 Julien, back in: https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36...@arm.com/ you said I should hook in enter_hypervisor_head(), leave_hypervisor_tail(). Those functions are gone now and looking at how the code changed, this is where I figured I should put the calls (see the second patch). But feel free to educate me otherwise. For x86 people that are listening... Do we have, in our beloved arch, equally handy places (i.e., right before leaving Xen for a guest and right after entering Xen from one), preferrably in a C file, and for all guests... like it seems to be the case on ARM? Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) commit 2c38754fa73a81e8dfab8abdfb18b9896e00 Author: Dario Faggioli Date: Sat Jan 30 07:50:22 2021 + xen: rename RCU idle timer and cpumask Both the cpumask and the timer will be used in more generic circumnstances, not only for CPUs that go idle. Change their names to reflect that. No functional change. Signed-off-by: Dario Faggioli diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index a5a27af3de..e0bf842f13 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -55,8 +55,8 @@ static struct rcu_ctrlblk { int next_pending; /* Is the next batch already waiting? */ spinlock_t lock __cacheline_aligned; -cpumask_t cpumask; /* CPUs that need to switch in order ... */ -cpumask_t idle_cpumask; /* ... unless they are already idle */ +cpumask_t cpumask; /* CPUs that need to switch in order ... */ +cpumask_t ignore_cpumask; /* ... unless they are already idle */ /* for current batch to proceed.*/ } __cacheline_aligned rcu_ctrlblk = { .cur = -300, @@ -88,8 +88,8 @@ struct rcu_data { longlast_rs_qlen; /* qlen during the last resched */ /* 3) idle CPUs handling */ -struct timer idle_timer; -bool idle_timer_active; +struct timer cb_timer; +bool cb_timer_active; boolprocess_callbacks; boolbarrier_active; @@ -121,22 +121,22 @@ struct rcu_data { * CPU that is going idle. The user can change this, via a boot time * parameter, but only up to 100ms. */ -#define IDLE_TIMER_PERIOD_MAX MILLISECS(100) -#define IDLE_TIMER_PERIOD_DEFAULT MILLISECS(10) -#define IDLE_TIMER_PERIOD_MIN MICROSECS(100) +#define CB_TIMER_PERIOD_MAX MILLISECS(100) +#define CB_TIMER_PERIOD_DEFAULT MILLISECS(10) +#define CB_TIMER_PERIOD_MIN MICROSECS(100) -static s_time_t __read_mostly idle_timer_period; +static s_time_t __read_mostly cb_timer_period; /* - * Increment and decrement values for the idle timer handler. The algorithm + * Increment and decrement values for the callback timer handler. The algorithm * works as follows: * - if the timer actually fires, and it finds out that the grace period isn't - * over yet, we add IDLE_TIMER_PERIOD_INCR to the timer's period; + * over yet, we add CB_TIMER_PERIOD_INCR to the timer's period; * - if the timer actually fires and it finds the grace period over, we * subtract IDLE_TIMER_PERIOD_DECR from the timer's period. */ -#define IDLE_TIMER_PERIOD_INCRMILLISECS(10) -#define IDLE_TIMER_PERIOD_DECRMICROSECS(100) +#define CB_TIMER_PERIOD_INCRMILLISECS(10) +#define CB_TIMER_PERIOD_DECRMICROSECS(100) static DEFINE_PER_CPU(struct rcu_data, rcu_data); @@ -364,7 +364,7 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp) * This barrier is paired with the one in rcu_idle_enter(). */ smp_mb(); -cpumask_andnot(>cpumask, _online_map, >idle_cpumask); +cpumask_andnot(>cpumask, _online_map, >ignore_cpumask); } } @@ -523,7 +523,7 @@ int rcu_needs_cpu(int cpu) { struct rcu_data *rdp = _cpu(rcu_data, cpu); -return (rdp->curlist && !rdp->idle_timer_active) || rcu_pending(cpu); +return (rdp->curlist && !rdp->cb_timer_active) ||
Re: Null scheduler and vwfi native problem
On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote: > On 29.01.21 09:08, Anders Törnqvist wrote: > > > > > > So it using it has only downsides (and that's true in general, if > > > you > > > ask me, but particularly so if using NULL). > > Thanks for the feedback. > > I removed dom0_vcpus_pin. And, as you said, it seems to be > > unrelated to > > the problem we're discussing. > Right. Don't put it back, and stay away from it, if you accept an advice. :-) > > The system still behaves the same. > > Yeah, that was expected. > > When the dom0_vcpus_pin is removed. xl vcpu-list looks like this: > > > > Name ID VCPU CPU State Time(s) > > Affinity (Hard / Soft) > > Domain-0 0 0 0 r-- 29.4 > > all / all > > Domain-0 0 1 1 r-- 28.7 > > all / all > > Domain-0 0 2 2 r-- 28.7 > > all / all > > Domain-0 0 3 3 r-- 28.6 > > all / all > > Domain-0 0 4 4 r-- 28.6 > > all / all > > mydomu 1 0 5 r-- 21.6 5 > > / all > > Right, and it makes sense for it to look like this. > > From this listing (with "all" as hard affinity for dom0) one might > > read > > it like dom0 is not pinned with hard affinity to any specific pCPUs > > at > > all but mudomu is pinned to pCPU 5. > > Will the dom0_max_vcpus=5 in this case guarantee that dom0 only > > will run > > on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only? > > No. > Well, yes... if you use the NULL scheduler. Which is in use here. :-) Basically, the NULL scheduler _always_ assign one and only one vCPU to each pCPU. This happens at domain (well, at the vCPU) creation time. And it _never_ move a vCPU away from the pCPU to which it has assigned it. And it also _never_ change this vCPU-->pCPU assignment/relationship, unless some special event happens (such as, either the vCPU and/or the pCPU goes offline, is removed from the cpupool, you change the affinity [as I'll explain below], etc). This is the NULL scheduler's mission and only job, so it does that by default, _without_ any need for an affinity to be specified. So, how can affinity be useful in the NULL scheduler? Well, it's useful if you want to control and decide to what pCPU a certain vCPU should go. So, let's make an example. Let's say you are in this situation: NameID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 00 r-- 29.4 all / all Domain-0 0 11 r-- 28.7 all / all Domain-0 0 22 r-- 28.7 all / all Domain-0 0 33 r-- 28.6 all / all Domain-0 0 44 r-- 28.6 all / all I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are not using dom0_vcpus_pin. The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU that can run on pCPU 0, despite its affinities being "all"... because it's what the NULL scheduler does for you and it's the reason why one uses it! :-) Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3 and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as well. With no affinity being specified, which vCPU is assigned to which pCPU is entirely under the NULL scheduler control. It has its heuristics inside, to try to do that in a smart way, but that's an internal/implementation detail and is not relevant here. If you now create a domU with 1 vCPU, that vCPU will be assigned to pCPU 5. Now, let's say that, for whatever reason, you absolutely want that d0v2 to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is what the NULL scheduler decided to pick for it). Well, what you do is use xl, set the affinity of d0v2 to pCPU 5, and you will get something like this as a result: NameID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 00 r-- 29.4 all / all Domain-0 0 11 r-- 28.7 all / all Domain-0 0 25 r-- 28.7 5 / all Domain-0 0 33 r-- 28.6 all / all Domain-0 0 44 r-- 28.6 all / all So, affinity is indeed useful, even when using NULL, if you want to diverge from the default behavior and enact a certain policy, maybe due to the nature of your workload, the characteristics of your hardware, or whatever. It is not, however, necessary to set the affinity to: - have a vCPU to always stay on one --and always the same one too-- pCPU; - avoid
Re: Null scheduler and vwfi native problem
On 29.01.21 09:08, Anders Törnqvist wrote: On 1/26/21 11:31 PM, Dario Faggioli wrote: On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote: On 1/25/21 5:11 PM, Dario Faggioli wrote: On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote: Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote: On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation. So, Anders, would it be possible to issue a: # xl debug-keys r # xl dmesg And send it to us ? Ideally, you'd do it: - with Julien's patch (the one he sent the other day, and that you have already given a try to) applied - while you are in the state above, i.e., after having tried to destroy a domain and failing - and maybe again after having tried to start a new domain Here are some logs. Great, thanks a lot! The system is booted as before with the patch and the domu config does not have the IRQs. Ok. # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r- 820.1 mydomu 1 511 1 r- 157.0 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=191793008000 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 (XEN) Waitqueue: So far, so good. All vCPUs are running on their assigned pCPU, and there is no vCPU wanting to run but not having a vCPU where to do so. (XEN) Command line: console=dtuart dtuart=/serial@5a06 dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin sched=null vwfi=native Oh, just as a side note (and most likely unrelated to the problem we're discussing), you should be able to get rid of dom0_vcpus_pin. The NULL scheduler will do something similar to what that option itself does anyway. And with the benefit that, if you want, you can actually change to what pCPUs the dom0's vCPU are pinned. While, if you use dom0_vcpus_pin, you can't. So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). Thanks for the feedback. I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to the problem we're discussing. The system still behaves the same. When the dom0_vcpus_pin is removed. xl vcpu-list looks like this: Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 r-- 29.4 all / all Domain-0 0 1 1 r-- 28.7 all / all Domain-0 0 2 2 r-- 28.7 all / all Domain-0 0 3 3 r-- 28.6 all / all Domain-0 0 4 4 r-- 28.6 all / all mydomu 1 0 5 r-- 21.6 5 / all From this listing (with "all" as hard affinity for dom0) one might read it like dom0 is not pinned with hard affinity to any specific pCPUs at all but mudomu is pinned to pCPU 5. Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only? No. What if I would like mydomu to be th only domain that uses pCPU 2? Setup a cpupool with that pcpu assigned to it and put your domain into that cpupool. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: Null scheduler and vwfi native problem
On 1/26/21 11:31 PM, Dario Faggioli wrote: On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote: On 1/25/21 5:11 PM, Dario Faggioli wrote: On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote: Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote: On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation. So, Anders, would it be possible to issue a: # xl debug-keys r # xl dmesg And send it to us ? Ideally, you'd do it: - with Julien's patch (the one he sent the other day, and that you have already given a try to) applied - while you are in the state above, i.e., after having tried to destroy a domain and failing - and maybe again after having tried to start a new domain Here are some logs. Great, thanks a lot! The system is booted as before with the patch and the domu config does not have the IRQs. Ok. # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r- 820.1 mydomu 1 511 1 r- 157.0 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=191793008000 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 (XEN) Waitqueue: So far, so good. All vCPUs are running on their assigned pCPU, and there is no vCPU wanting to run but not having a vCPU where to do so. (XEN) Command line: console=dtuart dtuart=/serial@5a06 dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin sched=null vwfi=native Oh, just as a side note (and most likely unrelated to the problem we're discussing), you should be able to get rid of dom0_vcpus_pin. The NULL scheduler will do something similar to what that option itself does anyway. And with the benefit that, if you want, you can actually change to what pCPUs the dom0's vCPU are pinned. While, if you use dom0_vcpus_pin, you can't. So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). Thanks for the feedback. I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to the problem we're discussing. The system still behaves the same. When the dom0_vcpus_pin is removed. xl vcpu-list looks like this: Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 r-- 29.4 all / all Domain-0 0 1 1 r-- 28.7 all / all Domain-0 0 2 2 r-- 28.7 all / all Domain-0 0 3 3 r-- 28.6 all / all Domain-0 0 4 4 r-- 28.6 all / all mydomu 1 0 5 r-- 21.6 5 / all From this listing (with "all" as hard affinity for dom0) one might read it like dom0 is not pinned with hard affinity to any specific pCPUs at all but mudomu is pinned to pCPU 5. Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only? What if I would like mydomu to be th only domain that uses pCPU 2? # xl destroy mydomu (XEN) End of domain_destroy function # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r- 1057.9 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=223871439875 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 Right. And from the fact that: 1) we only see the "End of domain_destroy function" line in the logs, and 2) we see that the vCPU is still listed here, we have our confirmation (like there wase the need for it :-/) that domain destruction is done only partially. Yes it looks like that. # xl create mydomu.cfg Parsing config from mydomu.cfg (XEN) Power on resource 215 # xl list Name
Re: Null scheduler and vwfi native problem
On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote: > On 1/25/21 5:11 PM, Dario Faggioli wrote: > > On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote: > > > Hi Anders, > > > > > > On 22/01/2021 08:06, Anders Törnqvist wrote: > > > > On 1/22/21 12:35 AM, Dario Faggioli wrote: > > > > > On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: > > > > - booting with "sched=null vwfi=native" but not doing the IRQ > > > > passthrough that you mentioned above > > > > "xl destroy" gives > > > > (XEN) End of domain_destroy function > > > > > > > > Then a "xl create" says nothing but the domain has not started > > > > correct. > > > > "xl list" look like this for the domain: > > > > mydomu 2 512 1 -- > > > > 0.0 > > > This is odd. I would have expected ``xl create`` to fail if > > > something > > > went wrong with the domain creation. > > > > > So, Anders, would it be possible to issue a: > > > > # xl debug-keys r > > # xl dmesg > > > > And send it to us ? > > > > Ideally, you'd do it: > > - with Julien's patch (the one he sent the other day, and that > > you > > have already given a try to) applied > > - while you are in the state above, i.e., after having tried to > > destroy a domain and failing > > - and maybe again after having tried to start a new domain > Here are some logs. > Great, thanks a lot! > The system is booted as before with the patch and the domu config > does > not have the IRQs. > Ok. > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r- > 820.1 > mydomu 1 511 1 r- > 157.0 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=191793008000 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > (XEN) Waitqueue: > So far, so good. All vCPUs are running on their assigned pCPU, and there is no vCPU wanting to run but not having a vCPU where to do so. > (XEN) Command line: console=dtuart dtuart=/serial@5a06 > dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin > sched=null vwfi=native > Oh, just as a side note (and most likely unrelated to the problem we're discussing), you should be able to get rid of dom0_vcpus_pin. The NULL scheduler will do something similar to what that option itself does anyway. And with the benefit that, if you want, you can actually change to what pCPUs the dom0's vCPU are pinned. While, if you use dom0_vcpus_pin, you can't. So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). > # xl destroy mydomu > (XEN) End of domain_destroy function > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r- > 1057.9 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=223871439875 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > Right. And from the fact that: 1) we only see the "End of domain_destroy function" line in the logs, and 2) we see that the vCPU is still listed here, we have our confirmation (like there wase the need for it :-/) that domain destruction is done only partially. > # xl create mydomu.cfg > Parsing config from mydomu.cfg > (XEN) Power on resource 215 > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r- > 1152.1 > mydomu 2 512 1 -- > 0.0 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=241210530250 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > (XEN) Domain: 2 > (XEN) 7: [2.0] pcpu=-1 > (XEN) Waitqueue: d2v0 > Yep, so, as we were suspecting, domain 1 was not destroyed properly. Specifically, we did not get to the point where the vCPU is
Re: Null scheduler and vwfi native problem
On 1/25/21 5:11 PM, Dario Faggioli wrote: On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote: Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote: On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation. So, Anders, would it be possible to issue a: # xl debug-keys r # xl dmesg And send it to us ? Ideally, you'd do it: - with Julien's patch (the one he sent the other day, and that you have already given a try to) applied - while you are in the state above, i.e., after having tried to destroy a domain and failing - and maybe again after having tried to start a new domain Here are some logs. The system is booted as before with the patch and the domu config does not have the IRQs. # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r- 820.1 mydomu 1 511 1 r- 157.0 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=191793008000 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 (XEN) Waitqueue: (XEN) CPUs info: (XEN) CPU[00] sibling={0}, core={0}, unit=d0v0 (XEN) run: [0.0] pcpu=0 (XEN) CPU[01] sibling={1}, core={1}, unit=d0v1 (XEN) run: [0.1] pcpu=1 (XEN) CPU[02] sibling={2}, core={2}, unit=d0v2 (XEN) run: [0.2] pcpu=2 (XEN) CPU[03] sibling={3}, core={3}, unit=d0v3 (XEN) run: [0.3] pcpu=3 (XEN) CPU[04] sibling={4}, core={4}, unit=d0v4 (XEN) run: [0.4] pcpu=4 (XEN) CPU[05] sibling={5}, core={5}, unit=d1v0 (XEN) run: [1.0] pcpu=5 # xl dmesg (XEN) Checking for initrd in /chosen (XEN) RAM: 8020 - (XEN) RAM: 00088000 - 0008 (XEN) (XEN) MODULE[0]: 8040 - 8054d848 Xen (XEN) MODULE[1]: 8300 - 83018000 Device Tree (XEN) MODULE[2]: 8800 - 89701200 Kernel (XEN) RESVD[0]: 8800 - 9000 (XEN) RESVD[1]: 8300 - 83018000 (XEN) RESVD[2]: 8400 - 85ff (XEN) RESVD[3]: 8600 - 863f (XEN) RESVD[4]: 9000 - 903f (XEN) RESVD[5]: 9040 - 91ff (XEN) RESVD[6]: 9200 - 921f (XEN) RESVD[7]: 9220 - 923f (XEN) RESVD[8]: 9240 - 943f (XEN) RESVD[9]: 9440 - 94bf (XEN) (XEN) CMDLINE[8800]:chosen console=hvc0 earlycon=xen root=/dev/mmcblk0p3 mem=3000M hostname=myhost video=HDMI-A-1:1920x1080@60 imxdrm.legacyfb_depth=32 quiet loglevel=3 logo.nologo vt.global_cursor_default=0 (XEN) (XEN) Command line: console=dtuart dtuart=/serial@5a06 dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin sched=null vwfi=native (XEN) Domain heap initialised (XEN) Booting using Device Tree (XEN) partition id 4 (XEN) Domain name mydomu (XEN) *Initialized MU (XEN) Looking for dtuart at "/serial@5a06", options "" Xen 4.13.1-pre (XEN) Xen version 4.13.1-pre (anders@builder.local) (aarch64-poky-linux-gcc (GCC) 8.3.0) debug=n Fri Jan 22 17:32:33 UTC 2021 (XEN) Latest ChangeSet: Wed Feb 27 17:56:28 2019 +0800 git:b64b8df-dirty (XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4 (XEN) 64-bit Execution: (XEN) Processor Features: 0100 (XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32 (XEN) Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg (XEN) Debug Features: 10305106 (XEN) Auxiliary Features: (XEN) Memory Model Features: 1122 (XEN) ISA Features: 00011120 (XEN) 32-bit Execution: (XEN) Processor Features: 0131:10011011 (XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 03010066 (XEN) Auxiliary Features: (XEN) Memory Model Features: 10201105 4000 0126 02102211 (XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 8000 KHz
Re: Null scheduler and vwfi native problem
On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote: > Hi Anders, > > On 22/01/2021 08:06, Anders Törnqvist wrote: > > On 1/22/21 12:35 AM, Dario Faggioli wrote: > > > On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: > > - booting with "sched=null vwfi=native" but not doing the IRQ > > passthrough that you mentioned above > > "xl destroy" gives > > (XEN) End of domain_destroy function > > > > Then a "xl create" says nothing but the domain has not started > > correct. > > "xl list" look like this for the domain: > > mydomu 2 512 1 -- > > 0.0 > > This is odd. I would have expected ``xl create`` to fail if something > went wrong with the domain creation. > So, Anders, would it be possible to issue a: # xl debug-keys r # xl dmesg And send it to us ? Ideally, you'd do it: - with Julien's patch (the one he sent the other day, and that you have already given a try to) applied - while you are in the state above, i.e., after having tried to destroy a domain and failing - and maybe again after having tried to start a new domain > One possibility is the NULL scheduler doesn't release the pCPUs until > the domain is fully destroyed. So if there is no pCPU free, it > wouldn't > be able to schedule the new domain. > > However, I would have expected the NULL scheduler to refuse the > domain > to create if there is no pCPU available. > Yeah but, unfortunately, the scheduler does not have it easy to fail domain creation at this stage (i.e., when we realize there are no available pCPUs). That's the reason why the NULL scheduler has a waitqueue, where vCPUs that cannot be put on any pCPU are put. Of course, this is a configuration error (or a bug, like maybe in this case :-/), and we print warnings when it happens. > @Dario, @Stefano, do you know when the NULL scheduler decides to > allocate the pCPU? > On which pCPU to allocate a vCPU is decided in null_unit_insert(), called from sched_alloc_unit() and sched_init_vcpu(). On the other hand, a vCPU is properly removed from its pCPU, hence making the pCPU free for being assigned to some other vCPU, in unit_deassign(), called from null_unit_remove(), which in turn is called from sched_destroy_vcpu() Which is indeed called from complete_domain_destroy(). Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
On Fri, 2021-01-22 at 18:44 +0100, Anders Törnqvist wrote: > Listing vcpus looks like this when the domain is running: > > xl vcpu-list > Name ID VCPU CPU State Time(s) > Affinity (Hard / Soft) > Domain-0 0 0 0 r-- 101.7 0 / > all > Domain-0 0 1 1 r-- 101.0 1 / > all > Domain-0 0 2 2 r-- 101.0 2 / > all > Domain-0 0 3 3 r-- 100.9 3 / > all > Domain-0 0 4 4 r-- 100.9 4 / > all > mydomu 1 0 5 r-- 89.5 5 / > all > > vCPU nr 0 is also for dom0. Is that normal? > Yeah, that's the vCPU IDs numbering. Each VM/guest (including dom0) has its vCPUs and they have ID starting from 0. What counts here, to make sure that the NULL scheduler "configuration" is correct, is that each VCPU is associated to one and only one PCPU. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
On 1/22/21 3:26 PM, Julien Grall wrote: Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote: On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation. The list of dash, suggests that the domain is: - Not running - Not blocked (i.e cannot run) - Not paused - Not shutdown So this suggest the NULL scheduler didn't schedule the vCPU. Would it be possible to describe your setup: - How many pCPUs? There are 6 pCPUs - How many vCPUs did you give to dom0? I gave it 5 - What was the number of the vCPUs given to the previous guest? Nr 0. Listing vcpus looks like this when the domain is running: xl vcpu-list Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 r-- 101.7 0 / all Domain-0 0 1 1 r-- 101.0 1 / all Domain-0 0 2 2 r-- 101.0 2 / all Domain-0 0 3 3 r-- 100.9 3 / all Domain-0 0 4 4 r-- 100.9 4 / all mydomu 1 0 5 r-- 89.5 5 / all vCPU nr 0 is also for dom0. Is that normal? One possibility is the NULL scheduler doesn't release the pCPUs until the domain is fully destroyed. So if there is no pCPU free, it wouldn't be able to schedule the new domain. However, I would have expected the NULL scheduler to refuse the domain to create if there is no pCPU available. @Dario, @Stefano, do you know when the NULL scheduler decides to allocate the pCPU? Cheers,
Re: Null scheduler and vwfi native problem
On 1/22/21 3:02 PM, Julien Grall wrote: Hi Dario, On 21/01/2021 23:35, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: Hi Dario, Hi! On 21/01/2021 18:32, Dario Faggioli wrote: On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html . Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? PCI passthrough is not yet supported on Arm :). However, the bug was reported with platform device passthrough. Yeah, well... That! Which indeed is not PCI. Sorry for the terminology mismatch. :-) Well, I'll think about it. > Starting the system without "sched=null vwfi=native" does not result in the problem. Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? I think we can skip the testing as the bug was fully diagnostics back then. Unfortunately, I don't think a patch was ever posted. True. But an hackish debug patch was provided and, back then, it worked. OTOH, Anders seems to be reporting that such a patch did not work here. I also continue to think that we're facing the same or a very similar problem... But I'm curious why applying the patch did not help this time. And that's why I asked for more testing. I wonder if this is because your patch doesn't modify rsinterval. So even if we call force_quiescent_state(), the softirq would only be raised for the current CPU. I guess the following HACK could confirm the theory: diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index a5a27af3def0..50020bc34ddf 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data *rdp, { cpumask_t cpumask; raise_softirq(RCU_SOFTIRQ); - if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) { + if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) { rdp->last_rs_qlen = rdp->qlen; /* * Don't send IPI to itself. With irqs disabled, Cheers, I applied the patch above. No change. The function complete_domain_destroy function is not call when I destroy the domain. /Anders
Re: Null scheduler and vwfi native problem
Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote: On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation. The list of dash, suggests that the domain is: - Not running - Not blocked (i.e cannot run) - Not paused - Not shutdown So this suggest the NULL scheduler didn't schedule the vCPU. Would it be possible to describe your setup: - How many pCPUs? - How many vCPUs did you give to dom0? - What was the number of the vCPUs given to the previous guest? One possibility is the NULL scheduler doesn't release the pCPUs until the domain is fully destroyed. So if there is no pCPU free, it wouldn't be able to schedule the new domain. However, I would have expected the NULL scheduler to refuse the domain to create if there is no pCPU available. @Dario, @Stefano, do you know when the NULL scheduler decides to allocate the pCPU? Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
Hi Dario, On 21/01/2021 23:35, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: Hi Dario, Hi! On 21/01/2021 18:32, Dario Faggioli wrote: On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html . Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? PCI passthrough is not yet supported on Arm :). However, the bug was reported with platform device passthrough. Yeah, well... That! Which indeed is not PCI. Sorry for the terminology mismatch. :-) Well, I'll think about it. > Starting the system without "sched=null vwfi=native" does not result in the problem. Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? I think we can skip the testing as the bug was fully diagnostics back then. Unfortunately, I don't think a patch was ever posted. True. But an hackish debug patch was provided and, back then, it worked. OTOH, Anders seems to be reporting that such a patch did not work here. I also continue to think that we're facing the same or a very similar problem... But I'm curious why applying the patch did not help this time. And that's why I asked for more testing. I wonder if this is because your patch doesn't modify rsinterval. So even if we call force_quiescent_state(), the softirq would only be raised for the current CPU. I guess the following HACK could confirm the theory: diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index a5a27af3def0..50020bc34ddf 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data *rdp, { cpumask_t cpumask; raise_softirq(RCU_SOFTIRQ); -if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) { +if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) { rdp->last_rs_qlen = rdp->qlen; /* * Don't send IPI to itself. With irqs disabled, Cheers, -- Julien Grall
Re: Null scheduler and vwfi native problem
On Fri, 2021-01-22 at 09:06 +0100, Anders Törnqvist wrote: > On 1/22/21 12:35 AM, Dario Faggioli wrote: > > > - booting with "sched=null" but not with "vwfi=native" > Without "vwfi=native" it works fine to destroy and to re-create the > domain. > Both printouts comes after a destroy: > (XEN) End of domain_destroy function > (XEN) End of complete_domain_destroy function > Ok, thanks for doing these tests. The fact that not using "vwfi=native" makes things work, seem to point in the direction that myself and Julien (and you as well!) were suspecting. I.e., it is the same issue than the one in the old xen- devel thread. I'm still a but puzzled why the debug patch posted back then does not work for you... but that's not really super important. Let's try to come up with a new debug patch and, this time, a proper fix. :-) Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
On 1/21/21 7:32 PM, Dario Faggioli wrote: On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: Hi, Hello, I see a problem with destroy and restart of a domain. Interrupts are not available when trying to restart a domain. The situation seems very similar to the thread "null scheduler bug" https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html . Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? The target system is a iMX8-based ARM board and Xen is a 4.13.0 version built from https://source.codeaurora.org/external/imx/imx-xen.git. Mmm, perhaps it's me, but neither going at that url with a browser not trying to clone it, I do not see anything. What I'm doing wrong? Sorry. The link is https://source.codeaurora.org/external/imx/imx-xen. Xen is booted with sched=null vwfi=native. One physical CPU core is pinned to the domu. Some interrupts are passed through to the domu. Ok, I guess it is involved, since you say "some interrupts are passed through..." When destroying the domain with xl destroy etc it does not complain but then when trying to restart the domain again with a "xl create " I get: (XEN) IRQ 210 is already used by domain 1 "xl list" does not contain the domain. Repeating the "xl create" command 5-10 times eventually starts the domain without complaining about the IRQ. Inspired from the discussion in the thread above I have put printks in the xen/common/domain.c file. In the function domain_destroy I have a printk("End of domain_destroy function\n") in the end. In the function complete_domain_destroy have a printk("Begin of complete_domain_destroy function\n") in the beginning. With these printouts I get at "xl destroy": (XEN) End of domain_destroy function So it seems like the function complete_domain_destroy is not called. Ok, thanks for making these tests. It's helpful to have this information right away. "xl create" results in: (XEN) IRQ 210 is already used by domain 1 (XEN) End of domain_destroy function Then repeated "xl create" looks the same until after a few tries I also get: (XEN) Begin of complete_domain_destroy function After that the next "xl create" creates the domain. I have also applied the patch from https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html . This does seem to change the results. Ah... Really? That's a bit unexpected, TBH. Well, I'll think about it. Starting the system without "sched=null vwfi=native" does not result in the problem. Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? Regards
Re: Null scheduler and vwfi native problem
Thanks for the responses. On 1/22/21 12:35 AM, Dario Faggioli wrote: On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: Hi Dario, Hi! On 21/01/2021 18:32, Dario Faggioli wrote: On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html . Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? PCI passthrough is not yet supported on Arm :). However, the bug was reported with platform device passthrough. Yeah, well... That! Which indeed is not PCI. Sorry for the terminology mismatch. :-) Well, I'll think about it. > Starting the system without "sched=null vwfi=native" does not result in the problem. Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? I think we can skip the testing as the bug was fully diagnostics back then. Unfortunately, I don't think a patch was ever posted. True. But an hackish debug patch was provided and, back then, it worked. OTOH, Anders seems to be reporting that such a patch did not work here. I also continue to think that we're facing the same or a very similar problem... But I'm curious why applying the patch did not help this time. And that's why I asked for more testing. I made the tests as suggested to shed some more light if needed. - booting with "sched=null" but not with "vwfi=native" Without "vwfi=native" it works fine to destroy and to re-create the domain. Both printouts comes after a destroy: (XEN) End of domain_destroy function (XEN) End of complete_domain_destroy function - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 -- 0.0 Anyway, it's true that we left the issue pending, so something like this: From Xen PoV, any pCPU executing guest context can be considered quiescent. So one way to solve the problem would be to mark the pCPU when entering to the guest. Should be done anyway. We'll then see if it actually solves this problem too, or if this is really something else. Thanks for the summary, BTW. :-) I'll try to work on a patch. Thanks, just let me know if I can do some testing to assist. Regards [1] https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/
Re: Null scheduler and vwfi native problem
On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote: > Hi Dario, > Hi! > On 21/01/2021 18:32, Dario Faggioli wrote: > > On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: > > > > > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html > > > . > > > > > Right. Back then, PCI passthrough was involved, if I remember > > correctly. Is it the case for you as well? > > PCI passthrough is not yet supported on Arm :). However, the bug was > reported with platform device passthrough. > Yeah, well... That! Which indeed is not PCI. Sorry for the terminology mismatch. :-) > > Well, I'll think about it. > > > > Starting the system without "sched=null vwfi=native" does not > > > result > > > in > > > the problem. > > > > > Ok, how about, if you're up for some more testing: > > > > - booting with "sched=null" but not with "vwfi=native" > > - booting with "sched=null vwfi=native" but not doing the IRQ > > passthrough that you mentioned above > > > > ? > > I think we can skip the testing as the bug was fully diagnostics back > then. Unfortunately, I don't think a patch was ever posted. > True. But an hackish debug patch was provided and, back then, it worked. OTOH, Anders seems to be reporting that such a patch did not work here. I also continue to think that we're facing the same or a very similar problem... But I'm curious why applying the patch did not help this time. And that's why I asked for more testing. Anyway, it's true that we left the issue pending, so something like this: > From Xen PoV, any pCPU executing guest context can be considered > quiescent. So one way to solve the problem would be to mark the pCPU > when entering to the guest. > Should be done anyway. We'll then see if it actually solves this problem too, or if this is really something else. Thanks for the summary, BTW. :-) I'll try to work on a patch. Regards > [1] > > https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/ -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Null scheduler and vwfi native problem
Hi Dario, On 21/01/2021 18:32, Dario Faggioli wrote: On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: Hi, I see a problem with destroy and restart of a domain. Interrupts are not available when trying to restart a domain. The situation seems very similar to the thread "null scheduler bug" https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html . Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? PCI passthrough is not yet supported on Arm :). However, the bug was reported with platform device passthrough. [...] "xl create" results in: (XEN) IRQ 210 is already used by domain 1 (XEN) End of domain_destroy function Then repeated "xl create" looks the same until after a few tries I also get: (XEN) Begin of complete_domain_destroy function After that the next "xl create" creates the domain. I have also applied the patch from https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html . This does seem to change the results. Ah... Really? That's a bit unexpected, TBH. Well, I'll think about it. > Starting the system without "sched=null vwfi=native" does not result in the problem. Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? I think we can skip the testing as the bug was fully diagnostics back then. Unfortunately, I don't think a patch was ever posted. The interesting bits start at [1]. Let me try to summarize here. This has nothing to do with device passthrough, but the bug is easier to spot as interrupts are only going to be released when then domain is fully destroyed (we should really release them during the relinquish period...). The last step of the domain destruction (complete_domain_destroy()) will *only* happen when all the CPUs are considered quiescent from the RCU PoV. As you pointed out on that thread, the RCU implementation in Xen requires the pCPU to enter in the hypervisor (via hypercalls, interrupts...) time to time. This assumption doesn't hold anymore when using "sched=null vwfi=native" because a vCPU will not exit when it is idling (vwfi=native) and there may not be any other source of interrupt on that vCPU. Therefore the quiescent state will never be reached on the pCPU running that vCPU. From Xen PoV, any pCPU executing guest context can be considered quiescent. So one way to solve the problem would be to mark the pCPU when entering to the guest. Cheers, [1] https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/ -- Julien Grall
Re: Null scheduler and vwfi native problem
On 21/01/2021 10:54, Anders Törnqvist wrote: Hi, Hi Anders, Thank you for reporting the bug. I am adding Stefano and Dario as IIRC they were going to work on a solution. Cheers, I see a problem with destroy and restart of a domain. Interrupts are not available when trying to restart a domain. The situation seems very similar to the thread "null scheduler bug" https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html. The target system is a iMX8-based ARM board and Xen is a 4.13.0 version built from https://source.codeaurora.org/external/imx/imx-xen.git. Xen is booted with sched=null vwfi=native. One physical CPU core is pinned to the domu. Some interrupts are passed through to the domu. When destroying the domain with xl destroy etc it does not complain but then when trying to restart the domain again with a "xl create " I get: (XEN) IRQ 210 is already used by domain 1 "xl list" does not contain the domain. Repeating the "xl create" command 5-10 times eventually starts the domain without complaining about the IRQ. Inspired from the discussion in the thread above I have put printks in the xen/common/domain.c file. In the function domain_destroy I have a printk("End of domain_destroy function\n") in the end. In the function complete_domain_destroy have a printk("Begin of complete_domain_destroy function\n") in the beginning. With these printouts I get at "xl destroy": (XEN) End of domain_destroy function So it seems like the function complete_domain_destroy is not called. "xl create" results in: (XEN) IRQ 210 is already used by domain 1 (XEN) End of domain_destroy function Then repeated "xl create" looks the same until after a few tries I also get: (XEN) Begin of complete_domain_destroy function After that the next "xl create" creates the domain. I have also applied the patch from https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. This does seem to change the results. Starting the system without "sched=null vwfi=native" does not result in the problem. BR Anders -- Julien Grall
Re: Null scheduler and vwfi native problem
On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote: > Hi, > Hello, > I see a problem with destroy and restart of a domain. Interrupts are > not > available when trying to restart a domain. > > The situation seems very similar to the thread "null scheduler bug" > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html > . > Right. Back then, PCI passthrough was involved, if I remember correctly. Is it the case for you as well? > The target system is a iMX8-based ARM board and Xen is a 4.13.0 > version > built from https://source.codeaurora.org/external/imx/imx-xen.git. > Mmm, perhaps it's me, but neither going at that url with a browser not trying to clone it, I do not see anything. What I'm doing wrong? > Xen is booted with sched=null vwfi=native. > One physical CPU core is pinned to the domu. > Some interrupts are passed through to the domu. > Ok, I guess it is involved, since you say "some interrupts are passed through..." > When destroying the domain with xl destroy etc it does not complain > but > then when trying to restart the domain > again with a "xl create " I get: > (XEN) IRQ 210 is already used by domain 1 > > "xl list" does not contain the domain. > > Repeating the "xl create" command 5-10 times eventually starts the > domain without complaining about the IRQ. > > Inspired from the discussion in the thread above I have put printks > in > the xen/common/domain.c file. > In the function domain_destroy I have a printk("End of domain_destroy > function\n") in the end. > In the function complete_domain_destroy have a printk("Begin of > complete_domain_destroy function\n") in the beginning. > > With these printouts I get at "xl destroy": > (XEN) End of domain_destroy function > > So it seems like the function complete_domain_destroy is not called. > Ok, thanks for making these tests. It's helpful to have this information right away. > "xl create" results in: > (XEN) IRQ 210 is already used by domain 1 > (XEN) End of domain_destroy function > > Then repeated "xl create" looks the same until after a few tries I > also get: > (XEN) Begin of complete_domain_destroy function > > After that the next "xl create" creates the domain. > > > I have also applied the patch from > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html > . > This does seem to change the results. > Ah... Really? That's a bit unexpected, TBH. Well, I'll think about it. > Starting the system without "sched=null vwfi=native" does not result > in > the problem. > Ok, how about, if you're up for some more testing: - booting with "sched=null" but not with "vwfi=native" - booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above ? Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Null scheduler and vwfi native problem
Hi, I see a problem with destroy and restart of a domain. Interrupts are not available when trying to restart a domain. The situation seems very similar to the thread "null scheduler bug" https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html. The target system is a iMX8-based ARM board and Xen is a 4.13.0 version built from https://source.codeaurora.org/external/imx/imx-xen.git. Xen is booted with sched=null vwfi=native. One physical CPU core is pinned to the domu. Some interrupts are passed through to the domu. When destroying the domain with xl destroy etc it does not complain but then when trying to restart the domain again with a "xl create " I get: (XEN) IRQ 210 is already used by domain 1 "xl list" does not contain the domain. Repeating the "xl create" command 5-10 times eventually starts the domain without complaining about the IRQ. Inspired from the discussion in the thread above I have put printks in the xen/common/domain.c file. In the function domain_destroy I have a printk("End of domain_destroy function\n") in the end. In the function complete_domain_destroy have a printk("Begin of complete_domain_destroy function\n") in the beginning. With these printouts I get at "xl destroy": (XEN) End of domain_destroy function So it seems like the function complete_domain_destroy is not called. "xl create" results in: (XEN) IRQ 210 is already used by domain 1 (XEN) End of domain_destroy function Then repeated "xl create" looks the same until after a few tries I also get: (XEN) Begin of complete_domain_destroy function After that the next "xl create" creates the domain. I have also applied the patch from https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. This does seem to change the results. Starting the system without "sched=null vwfi=native" does not result in the problem. BR Anders