subject:"Re\: \[Xen\-devel\] Guest start issue on ARM \(maybe related to Credit2\) \[Was\: Re\: \[xen\-unstable test\] 113807\: regressions \- FAIL\]"

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-28 Thread Dario Faggioli

On Thu, 2017-09-28 at 00:52 +0100, Julien Grall wrote:
> On 09/28/2017 12:51 AM, Julien Grall wrote:
> > > Things *should really not* explode (like as in Xen crashes) if
> > > that
> > > happens; actually, from a scheduler perspective, it should really
> > > not
> > > be too big of a deal (especially if the overload is transient,
> > > like I
> > > guess it should be in this case). However, it's entirely possible
> > > that
> > > some specific vCPUs failing to be scheduler for a certain amount
> > > of
> > > time, causes something _inside_ the guest to timeout, or get
> > > stuck or
> > > wedged, which may be what happens here.
> > 
> > Looking at the log I don't see any crash of Xen and it seems to
> > be responsive.
> 
> I forgot to add that I don't see any timeout on the guest console
> but can notice slow down (waiting for some PV device).
> 
Exactly! And in fact, I'm saying that, even if nothing breaks, maybe
there are intervals during which --due to the combination of the
overload, the non work-conserving nature and the fact that these CPUs
are slow-- Dom0 is slow in dealing with the backends, to the point that
OSSTest times out.

Then, after the "load spike", everything goes back to normal, the
system is responsive, the logs (like the runqueue dump you posted)
depicts a normal semi-idle system.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-27 Thread Julien Grall




On 09/28/2017 12:51 AM, Julien Grall wrote:

Hi Dario,

On 09/26/2017 09:51 PM, Dario Faggioli wrote:

On Tue, 2017-09-26 at 18:28 +0100, Julien Grall wrote:

On 09/26/2017 08:33 AM, Dario Faggioli wrote:



Here's the logs:
http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-
armhf-xl-rtds/info.html


It does not seem to be similar, in the credit2 case the kernel is
stuck at very early boot.
Here it seems it is running (there are grants setup).


Yes, I agree, it's not totally similar.


This seem to be confirmed from the guest console log, I can see the
prompt. Interestingly
when the guest job fails, it has been waiting for a long time disk
and hvc0. Although, it
does not timeout.


Ah, I see what you mean, I found it in the guest console log.


I am actually quite surprised that we start a 4 vCPUs guest on a 2
pCPUs platform. The total of
vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest
for testing. So I was
wondering if we end up to have too many vCPUs running on the platform
and making it unreliable
the test?


Well, doing that, with this scheduler, is certainly *not* the best
recipe for determinism and reliability.

In fact, RTDS is a non-work conserving scheduler. This means that (with
default parameters) each vCPU gets at most 40% CPU time, even if there
are idle cycles.

With 6 vCPU, there's a total demand of 240% of CPU time, and with 2
pCPUs, there's at most 200% of that, which means we're in overload
(well, at least that's the case if/when all the vCPUs try to execute
for their guaranteed 40%).

Things *should really not* explode (like as in Xen crashes) if that
happens; actually, from a scheduler perspective, it should really not
be too big of a deal (especially if the overload is transient, like I
guess it should be in this case). However, it's entirely possible that
some specific vCPUs failing to be scheduler for a certain amount of
time, causes something _inside_ the guest to timeout, or get stuck or
wedged, which may be what happens here.


Looking at the log I don't see any crash of Xen and it seems to
be responsive.


I forgot to add that I don't see any timeout on the guest console
but can notice slow down (waiting for some PV device).



I don't know much about the scheduler and how to interpret the logs:

Sep 25 22:43:21.495119 (XEN) Domain info:
Sep 25 22:43:21.503073 (XEN)domain: 0
Sep 25 22:43:21.503100 (XEN) [0.0 ] cpu 0, (1000, 400), 
cur_b=3895333 cur_d=161112000 last_start=166505875
Sep 25 22:43:21.511080 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.519082 (XEN) [0.1 ] cpu 1, (1000, 400), 
cur_b=3946375 cur_d=161113000 last_start=1611126446583
Sep 25 22:43:21.527023 (XEN) onQ=0 runnable=1 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.535063 (XEN)domain: 5
Sep 25 22:43:21.535089 (XEN) [5.0 ] cpu 0, (1000, 400), 
cur_b=3953875 cur_d=161112000 last_start=160106041
Sep 25 22:43:21.543073 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.551078 (XEN) [5.1 ] cpu 1, (1000, 400), 
cur_b=3938167 cur_d=161114000 last_start=1611130169791
Sep 25 22:43:21.559063 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.559096 (XEN) [5.2 ] cpu 1, (1000, 400), 
cur_b=3952500 cur_d=161114000 last_start=1611130107958
Sep 25 22:43:21.575067 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.575101 (XEN) [5.3 ] cpu 0, (1000, 400), 
cur_b=3951875 cur_d=161112000 last_start=160154166
Sep 25 22:43:21.583196 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1

Also, it seems to fail fairly reliably, so it might be possible
to set up a reproducer.

Cheers,



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-27 Thread Julien Grall

Hi Dario,

On 09/26/2017 09:51 PM, Dario Faggioli wrote:
> On Tue, 2017-09-26 at 18:28 +0100, Julien Grall wrote:
>> On 09/26/2017 08:33 AM, Dario Faggioli wrote:

>>> Here's the logs:
>>> http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-
>>> armhf-xl-rtds/info.html
>>
>> It does not seem to be similar, in the credit2 case the kernel is
>> stuck at very early boot.
>> Here it seems it is running (there are grants setup).
>>
> Yes, I agree, it's not totally similar.
> 
>> This seem to be confirmed from the guest console log, I can see the
>> prompt. Interestingly
>> when the guest job fails, it has been waiting for a long time disk
>> and hvc0. Although, it
>> does not timeout.
>>
> Ah, I see what you mean, I found it in the guest console log.
> 
>> I am actually quite surprised that we start a 4 vCPUs guest on a 2
>> pCPUs platform. The total of
>> vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest
>> for testing. So I was
>> wondering if we end up to have too many vCPUs running on the platform
>> and making it unreliable
>> the test?
>>
> Well, doing that, with this scheduler, is certainly *not* the best
> recipe for determinism and reliability.
> 
> In fact, RTDS is a non-work conserving scheduler. This means that (with
> default parameters) each vCPU gets at most 40% CPU time, even if there
> are idle cycles.
> 
> With 6 vCPU, there's a total demand of 240% of CPU time, and with 2
> pCPUs, there's at most 200% of that, which means we're in overload
> (well, at least that's the case if/when all the vCPUs try to execute
> for their guaranteed 40%).
> 
> Things *should really not* explode (like as in Xen crashes) if that
> happens; actually, from a scheduler perspective, it should really not
> be too big of a deal (especially if the overload is transient, like I
> guess it should be in this case). However, it's entirely possible that
> some specific vCPUs failing to be scheduler for a certain amount of
> time, causes something _inside_ the guest to timeout, or get stuck or
> wedged, which may be what happens here.

Looking at the log I don't see any crash of Xen and it seems to
be responsive.

I don't know much about the scheduler and how to interpret the logs:

Sep 25 22:43:21.495119 (XEN) Domain info:
Sep 25 22:43:21.503073 (XEN)domain: 0
Sep 25 22:43:21.503100 (XEN) [0.0 ] cpu 0, (1000, 400), 
cur_b=3895333 cur_d=161112000 last_start=166505875
Sep 25 22:43:21.511080 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.519082 (XEN) [0.1 ] cpu 1, (1000, 400), 
cur_b=3946375 cur_d=161113000 last_start=1611126446583
Sep 25 22:43:21.527023 (XEN) onQ=0 runnable=1 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.535063 (XEN)domain: 5
Sep 25 22:43:21.535089 (XEN) [5.0 ] cpu 0, (1000, 400), 
cur_b=3953875 cur_d=161112000 last_start=160106041
Sep 25 22:43:21.543073 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.551078 (XEN) [5.1 ] cpu 1, (1000, 400), 
cur_b=3938167 cur_d=161114000 last_start=1611130169791
Sep 25 22:43:21.559063 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.559096 (XEN) [5.2 ] cpu 1, (1000, 400), 
cur_b=3952500 cur_d=161114000 last_start=1611130107958
Sep 25 22:43:21.575067 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.575101 (XEN) [5.3 ] cpu 0, (1000, 400), 
cur_b=3951875 cur_d=161112000 last_start=160154166
Sep 25 22:43:21.583196 (XEN) onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1

Also, it seems to fail fairly reliably, so it might be possible
to set up a reproducer.

Cheers,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-26 Thread Dario Faggioli

On Tue, 2017-09-26 at 18:28 +0100, Julien Grall wrote:
> On 09/26/2017 08:33 AM, Dario Faggioli wrote:
> > > 
> > Here's the logs:
> > http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-
> > armhf-xl-rtds/info.html
> 
> It does not seem to be similar, in the credit2 case the kernel is
> stuck at very early boot.
> Here it seems it is running (there are grants setup).
> 
Yes, I agree, it's not totally similar.

> This seem to be confirmed from the guest console log, I can see the
> prompt. Interestingly
> when the guest job fails, it has been waiting for a long time disk
> and hvc0. Although, it
> does not timeout.
> 
Ah, I see what you mean, I found it in the guest console log.

> I am actually quite surprised that we start a 4 vCPUs guest on a 2
> pCPUs platform. The total of
> vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest
> for testing. So I was
> wondering if we end up to have too many vCPUs running on the platform
> and making it unreliable
> the test?
> 
Well, doing that, with this scheduler, is certainly *not* the best
recipe for determinism and reliability.

In fact, RTDS is a non-work conserving scheduler. This means that (with
default parameters) each vCPU gets at most 40% CPU time, even if there
are idle cycles.

With 6 vCPU, there's a total demand of 240% of CPU time, and with 2
pCPUs, there's at most 200% of that, which means we're in overload
(well, at least that's the case if/when all the vCPUs try to execute
for their guaranteed 40%).

Things *should really not* explode (like as in Xen crashes) if that
happens; actually, from a scheduler perspective, it should really not
be too big of a deal (especially if the overload is transient, like I
guess it should be in this case). However, it's entirely possible that
some specific vCPUs failing to be scheduler for a certain amount of
time, causes something _inside_ the guest to timeout, or get stuck or
wedged, which may be what happens here.

I'm adding Meng to Cc, to see what he thinks about this situation.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-26 Thread Julien Grall

Hi Dario,

On 09/26/2017 08:33 AM, Dario Faggioli wrote:
> On Mon, 2017-09-25 at 17:23 +0100, Julien Grall wrote:
>> On 09/25/2017 03:07 PM, Dario Faggioli wrote:
>>> I don't see much in the logs, TBH, but both `xl vcpu-list' and the
>>> 'r'
>>> debug key seem to suggest that vCPU 0 is running, while the other
>>> vCPUs
>>> have never run... like it was an issue with secondary (v)CPU
>>> bringup.
>>>
>>> It indeed shows up with Credit2, as it were _specific_ to it, but
>>> I'm
>>> not 100% sure. In fact, it indeed seems to never show up here:
>>> http://logs.test-lab.xenproject.org/osstest/results/history/test-ar
>>> mhf-
>>> armhf-xl/xen-unstable
>>>
>> Most of the time guest-start/debian.repeat fails, vCPU 0 is in
>> data/prefetch abort state. My guess is a latent cache bug that
>> credit2
>> appears to expose.
>>
> So, forgive my ARM ignorance, but how do you tell that the vCPU(s)
> is(are) in that particular state?

I was looking at the guest state dumped:

Sep 24 15:10:43.275221 (XEN) *** Dumping CPU1 guest state (d3v0): ***

Sep 24 15:10:43.279352 (XEN) [ Xen-4.10-unstable  arm32  debug=y   Not 
tainted ]

Sep 24 15:10:43.285242 (XEN) CPU:1

Sep 24 15:10:43.286597 (XEN) PC: 000c

Sep 24 15:10:43.288743 (XEN) CPSR:   81d7 MODE:32-bit Guest ABT

Sep 24 15:10:43.292741 (XEN)  R0: 0040 R1:  R2: 48c24000 R3: 
8000

Sep 24 15:10:43.298241 (XEN)  R4: 410aa758 R5: 410aacf8 R6: 0080 R7: 
c2c2c2c2

Sep 24 15:10:43.303850 (XEN)  R8: 4000 R9: 410fc074 R10:40b7923c 
R11:10101105 R12:

Sep 24 15:10:43.310457 (XEN) USR: SP:  LR: 

Sep 24 15:10:43.313714 (XEN) SVC: SP: 4199fb70 LR: 40208060 SPSR:41d3

Sep 24 15:10:43.318334 (XEN) ABT: SP:  LR: 000c SPSR:81d7

Sep 24 15:10:43.322863 (XEN) UND: SP:  LR:  SPSR:

Sep 24 15:10:43.327361 (XEN) IRQ: SP:  LR:  SPSR:

Sep 24 15:10:43.331855 (XEN) FIQ: SP:  LR: c1318ae4 SPSR:

Sep 24 15:10:43.336349 (XEN) FIQ: R8:  R9:  R10: 
R11: R12:


"MODE:..." is the current mode of the vCPU. In that case ABT means it receive 
an abort (e.g data/prefetch abort).

There are other mode such as:
- USR : User mode
- SVC : Kernel mode

> 
> I'm asking because I now wonder whether this same issue could also be
> the cause of these other failures, which we see from time to time:
> 
>flight 113816 xen-unstable real [real]
>http://logs.test-lab.xenproject.org/osstest/logs/113816/
> 
>[...]
> 
>Tests which did not succeed, but are not blocking:
> test-armhf-armhf-xl-rtds   16 guest-start/debian.repeat fail blocked in 
> 113387
> 
> Here's the logs:
> http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-armhf-xl-rtds/info.html

It does not seem to be similar, in the credit2 case the kernel is stuck at very 
early boot.
Here it seems it is running (there are grants setup).

This seem to be confirmed from the guest console log, I can see the prompt. 
Interestingly
when the guest job fails, it has been waiting for a long time disk and hvc0. 
Although, it
does not timeout.

I am actually quite surprised that we start a 4 vCPUs guest on a 2 pCPUs 
platform. The total of
vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest for 
testing. So I was
wondering if we end up to have too many vCPUs running on the platform and 
making it unreliable
the test?

Cheers,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-26 Thread Dario Faggioli

On Mon, 2017-09-25 at 17:23 +0100, Julien Grall wrote:
> On 09/25/2017 03:07 PM, Dario Faggioli wrote:
> > I don't see much in the logs, TBH, but both `xl vcpu-list' and the
> > 'r'
> > debug key seem to suggest that vCPU 0 is running, while the other
> > vCPUs
> > have never run... like it was an issue with secondary (v)CPU
> > bringup.
> > 
> > It indeed shows up with Credit2, as it were _specific_ to it, but
> > I'm
> > not 100% sure. In fact, it indeed seems to never show up here:
> > http://logs.test-lab.xenproject.org/osstest/results/history/test-ar
> > mhf-
> > armhf-xl/xen-unstable
> > 
> Most of the time guest-start/debian.repeat fails, vCPU 0 is in 
> data/prefetch abort state. My guess is a latent cache bug that
> credit2 
> appears to expose.
> 
So, forgive my ARM ignorance, but how do you tell that the vCPU(s)
is(are) in that particular state?

I'm asking because I now wonder whether this same issue could also be
the cause of these other failures, which we see from time to time:

  flight 113816 xen-unstable real [real]
  http://logs.test-lab.xenproject.org/osstest/logs/113816/

  [...]

  Tests which did not succeed, but are not blocking:
   test-armhf-armhf-xl-rtds   16 guest-start/debian.repeat fail blocked in 
113387

Here's the logs:
http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-armhf-xl-rtds/info.html

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-25 Thread Dario Faggioli

On Mon, 2017-09-25 at 17:23 +0100, Julien Grall wrote:
> On 09/25/2017 03:07 PM, Dario Faggioli wrote:
> > Hey,
> 
> Hi Dario,
> 
Hi!

> > I don't see much in the logs, TBH, but both `xl vcpu-list' and the
> > 'r'
> > debug key seem to suggest that vCPU 0 is running, while the other
> > vCPUs
> > have never run... like it was an issue with secondary (v)CPU
> > bringup.
> > 
> It definitely rings a bell, I have seen similar trace in July and I
> have 
> been working on a potential fix since then.
> 
> Most of the time guest-start/debian.repeat fails, vCPU 0 is in 
> data/prefetch abort state. My guess is a latent cache bug that
> credit2 
> appears to expose.
> 
> Indeed, the arm32 kernel is using set/way cache flush instruction at 
> boot time. They are used to clean one by one each level of caches on 
> each CPUs.
> 
> At the moment, Xen does not trap those instructions. As you know
> cache 
> may not be private to a given physical processors. So if you happen
> to 
> migrate the vCPU to another physical CPU, you may hit stale data.
> 
Ah, yes, I remember "hearing" you talking about this. We've also talked
about it a bit together... I just wasn't recognising it being what's
biting us here.

> I am still cleaning-up my work and hopefully can post a couple of
> series 
> soon. This is not targeting Xen 4.10 and I am not even sure it would
> fix 
> the problem here. But that's my best guess.
> 
Well, yes, now that you mention it, it indeed sounds plausible.

So, I was mainly curious about whether it was either something which
was affecting or directly caused by Credit2, or something that Credit2
can help diagnose, reproduce and fix.

Since we already have a candidate, and you're already working on the
(difficult! :-( ), well, let's see, once you'll have it, if it actually
cures the problem.

We'll jump back on it if it does not.

Thanks and regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

2017-09-25 Thread Julien Grall

On 09/25/2017 03:07 PM, Dario Faggioli wrote:

Hey,

Hi Dario,

On Mon, 2017-09-25 at 09:46 +, osstest service owner wrote:

flight 113807 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/113807/

So, triggered by this:

Tests which are failing intermittently (not blocking):
test-armhf-armhf-xl-credit2 16 guest-start/debian.repeat fail in
113791 pass in 113807

I went having a look, and discovered that it's indeed happening that,
from time to time, we fail to create a guest, on ARM, with Credit2.

Looking here:
http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-armhf-xl-credit2/xen-unstable

It seems to be happening only on the cubietracks, but in a non-linear
and non-deterministic fashion. E.g., 113791 failed on metzinger, which
is fine on 113800; 113611 and 113618 failed on baroque, which is fine
on 113638.

I don't see much in the logs, TBH, but both `xl vcpu-list' and the 'r'
debug key seem to suggest that vCPU 0 is running, while the other vCPUs
have never run... like it was an issue with secondary (v)CPU bringup.

It indeed shows up with Credit2, as it were _specific_ to it, but I'm
not 100% sure. In fact, it indeed seems to never show up here:
http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-
armhf-xl/xen-unstable

but it looks like it may have shown up in 112460 (but we don't have the
logs any longer):
http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-
armhf-xl-cubietruck/xen-unstable

So... ARM people? Does this ring any bell? Is this something known, or
easy to explain? What can I do for help?

It definitely rings a bell, I have seen similar trace in July and I have
been working on a potential fix since then.

Most of the time guest-start/debian.repeat fails, vCPU 0 is in
data/prefetch abort state. My guess is a latent cache bug that credit2
appears to expose.

Indeed, the arm32 kernel is using set/way cache flush instruction at
boot time. They are used to clean one by one each level of caches on
each CPUs.

At the moment, Xen does not trap those instructions. As you know cache
may not be private to a given physical processors. So if you happen to
migrate the vCPU to another physical CPU, you may hit stale data.

This means we have to trap and emulate set/way instructions. Per the ARM
ARM and also experience emulating them is a non-trivial.

Thankfully, people are trying to get rid of those instructions. For
instance arm64 Linux does not use it anymore. Sadly, arm32 linux
maintainer does not want to remove them... This is also used by EDK2 at
the moment.

The solution is to go through the P2M and clean & invalidate every page
one by one. This process is really realy slow given Xen on Arm is always
populating the P2M at guest creation.

So I have been working for the past 2 months to add PoD support on Arm.
I have a proof of concept that boot a guest and properly handle set/way
cache instructions.

I am still cleaning-up my work and hopefully can post a couple of series
soon. This is not targeting Xen 4.10 and I am not even sure it would fix
the problem here. But that's my best guess.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]

8 matches

Site Navigation

Mail list logo

Footer information