On 1/26/21 11:31 PM, Dario Faggioli wrote:
On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
On 1/25/21 5:11 PM, Dario Faggioli wrote:
On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
Hi Anders,
On 22/01/2021 08:06, Anders Törnqvist wrote:
On 1/22/21 12:35 AM, Dario Faggioli wrote:
On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above
"xl destroy" gives
(XEN) End of domain_destroy function
Then a "xl create" says nothing but the domain has not started
correct.
"xl list" look like this for the domain:
mydomu 2 512 1 ------
0.0
This is odd. I would have expected ``xl create`` to fail if
something
went wrong with the domain creation.
So, Anders, would it be possible to issue a:
# xl debug-keys r
# xl dmesg
And send it to us ?
Ideally, you'd do it:
- with Julien's patch (the one he sent the other day, and that
you
have already given a try to) applied
- while you are in the state above, i.e., after having tried to
destroy a domain and failing
- and maybe again after having tried to start a new domain
Here are some logs.
Great, thanks a lot!
The system is booted as before with the patch and the domu config
does
not have the IRQs.
Ok.
# xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 3000 5 r-----
820.1
mydomu 1 511 1 r-----
157.0
# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN) cpus_free =
(XEN) Domain info:
(XEN) Domain: 0
(XEN) 1: [0.0] pcpu=0
(XEN) 2: [0.1] pcpu=1
(XEN) 3: [0.2] pcpu=2
(XEN) 4: [0.3] pcpu=3
(XEN) 5: [0.4] pcpu=4
(XEN) Domain: 1
(XEN) 6: [1.0] pcpu=5
(XEN) Waitqueue:
So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.
(XEN) Command line: console=dtuart dtuart=/serial@5a060000
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
sched=null vwfi=native
Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.
The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.
So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).
Thanks for the feedback.
I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to
the problem we're discussing. The system still behaves the same.
When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:
Name ID VCPU CPU State Time(s)
Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 29.4 all / all
Domain-0 0 1 1 r-- 28.7 all / all
Domain-0 0 2 2 r-- 28.7 all / all
Domain-0 0 3 3 r-- 28.6 all / all
Domain-0 0 4 4 r-- 28.6 all / all
mydomu 1 0 5 r-- 21.6 5 / all
From this listing (with "all" as hard affinity for dom0) one might read
it like dom0 is not pinned with hard affinity to any specific pCPUs at
all but mudomu is pinned to pCPU 5.
Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run
on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?
What if I would like mydomu to be th only domain that uses pCPU 2?
# xl destroy mydomu
(XEN) End of domain_destroy function
# xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 3000 5 r-----
1057.9
# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=223871439875
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN) cpus_free =
(XEN) Domain info:
(XEN) Domain: 0
(XEN) 1: [0.0] pcpu=0
(XEN) 2: [0.1] pcpu=1
(XEN) 3: [0.2] pcpu=2
(XEN) 4: [0.3] pcpu=3
(XEN) 5: [0.4] pcpu=4
(XEN) Domain: 1
(XEN) 6: [1.0] pcpu=5
Right. And from the fact that: 1) we only see the "End of
domain_destroy function" line in the logs, and 2) we see that the vCPU
is still listed here, we have our confirmation (like there wase the
need for it :-/) that domain destruction is done only partially.
Yes it looks like that.
# xl create mydomu.cfg
Parsing config from mydomu.cfg
(XEN) Power on resource 215
# xl list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 3000 5 r-----
1152.1
mydomu 2 512 1 ------
0.0
# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=241210530250
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN) cpus_free =
(XEN) Domain info:
(XEN) Domain: 0
(XEN) 1: [0.0] pcpu=0
(XEN) 2: [0.1] pcpu=1
(XEN) 3: [0.2] pcpu=2
(XEN) 4: [0.3] pcpu=3
(XEN) 5: [0.4] pcpu=4
(XEN) Domain: 1
(XEN) 6: [1.0] pcpu=5
(XEN) Domain: 2
(XEN) 7: [2.0] pcpu=-1
(XEN) Waitqueue: d2v0
Yep, so, as we were suspecting, domain 1 was not destroyed properly.
Specifically, we did not get to the point where the vCPU is deallocated
and the pCPU to which such vCPU has been assigned to by the NULL
scheduler is released.
This means that the new vCPU (i.e., d2v0) has, from the point of view
of the NULL scheduler, no pCPU where to run. And it's therefore parked
in the waitqueue.
There should be a warning about that, which I don't see... but perhaps
I'm just misremembering.
Anyway, cool, this makes things even more clear.
Thanks again for letting us see these logs.
Thanks for the attention to this :-)
Any ideas for how to solve it?