Re: [Xen-devel] Xen optimization

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Andrii Anisov wrote:
> Hello Stefano,
> 
> On 12.12.18 19:39, Stefano Stabellini wrote:
> > Thanks for the good work, Andrii!
> > 
> > The WARM_MAX improvements for vwfi=native with your optimizations are
> > impressive.
> 
> I really hope you are not speaking about these numbers:
> 
> > > max=840 warm_max=120 min=120 avg=127
> 
> Those are TBM baremetal numbers in hyp mode.

I know, I was referring to your older results, sorry for the confusion.


> Did you try my RFC on your HW?


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Dario Faggioli
On Wed, 2018-12-12 at 19:32 +0200, Andrii Anisov wrote:
> On 12.12.18 19:10, Dario Faggioli wrote:
> > I think only bisection could shed some light on this. And it would
> > be
> > wonderful if you could do that, but I understand that it takes
> > time. :-
> > /
> Well, bisect might help. But I'm really confused why MemTotal may be
> reduced.
> 
Yeah, and although difficult to admit/see the reason why, I think this
looks like it is coming from something we do in Xen. And since you say
you have an old Xen version that works, I really see bisection as the
way to go...

> > Are you absolutely sure about that? That is, are you "just"
> > assuming
> > the scheduler won't move stuff, or have you put some debugging or
> > printing in place to verify that to be the case?
> Being honest, I did not check for exactly this setup. I verified it
> for 4.10.
>
Not sure I'm getting. Are you saying that you somehow verified that on
4.10 vcpus don't move? But on 4.10 you have pinning that works, don't
you?

Or are you saying you've verified that vcpus don't move, on 4.10, even
without doing the pinning? If yes, can I ask how?

As for staging, I really can't tell, as indeed there would be no need
for them to move, but they actually could, for a number of reasons.

So, unless you, like, put printk()-s (if you can) or ASSERTS() when v-
>processor changes, I wouldn't take that for granted. :-(

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Stefano,

On 12.12.18 19:39, Stefano Stabellini wrote:

Thanks for the good work, Andrii!

The WARM_MAX improvements for vwfi=native with your optimizations are 
impressive.


I really hope you are not speaking about these numbers:


max=840 warm_max=120 min=120 avg=127


Those are TBM baremetal numbers in hyp mode.

Did you try my RFC on your HW?

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Stefano Stabellini
On Wed, 12 Dec 2018, Andrii Anisov wrote:
> On 12.12.18 11:46, Andrii Anisov wrote:
> > Digging into that now.
> I got it. My u-boot starts TBM in hyp mode. But them both miss setting
> HCR_EL2.IMO, so no interrupt exception was taken in hyp.
> OK, for my baremetal TBM in hyp, numbers are:
> 
> max=840 warm_max=120 min=120 avg=127
> 
> I guess, warm_max and min are one tick of the system timer. And it seems to me
> that one tick of the system timer is the lower limit of the irq latency by HW
> design.

Thanks for the good work, Andrii!

The WARM_MAX improvements for vwfi=native with your optimizations are 
impressive.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Dario,

On 12.12.18 19:10, Dario Faggioli wrote:

Ah, yes... I've seen the thread. I haven't commented, as it is really,
really weird, and I don't know what to think/say.

I think only bisection could shed some light on this. And it would be
wonderful if you could do that, but I understand that it takes time. :-
/

Well, bisect might help. But I'm really confused why MemTotal may be reduced.


Are you absolutely sure about that? That is, are you "just" assuming
the scheduler won't move stuff, or have you put some debugging or
printing in place to verify that to be the case?Being honest, I did not check 
for exactly this setup. I verified it for 4.10.



I'm asking because, yet, in theory that is what one would expect. But,
as I think you know very well, although in theory there is no
difference between theory and practice, in practice, there is. :-)

I know it very well :)

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Dario Faggioli
On Wed, 2018-12-12 at 11:39 +0200, Andrii Anisov wrote:
> Hello Dario,
> 
Hi,

> On 11.12.18 18:56, Dario Faggioli wrote:
> > Also, what about Xen numbers, sched=null.
> Didn't check, will put on the list.
> 
:-)

> > I don't expect much improvement, considering pinning is in-place
> > already.
> Actually, I faced a strange issue with explicit pinning of Dom0.
> Didn't sort out the cause yet. And Julien says it is not reproducible
> on his desk.
>
Ah, yes... I've seen the thread. I haven't commented, as it is really,
really weird, and I don't know what to think/say.

I think only bisection could shed some light on this. And it would be
wonderful if you could do that, but I understand that it takes time. :-
/

> But yes, with VCPU number less than PCPUs - there is no migration of
> Dom0 VCPUs.
> 
Are you absolutely sure about that? That is, are you "just" assuming
the scheduler won't move stuff, or have you put some debugging or
printing in place to verify that to be the case? 

I'm asking because, yet, in theory that is what one would expect. But,
as I think you know very well, although in theory there is no
difference between theory and practice, in practice, there is. :-)

Regards,
Dario

> [1]https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov


On 12.12.18 11:46, Andrii Anisov wrote:

Digging into that now.

I got it. My u-boot starts TBM in hyp mode. But them both miss setting 
HCR_EL2.IMO, so no interrupt exception was taken in hyp.
OK, for my baremetal TBM in hyp, numbers are:

max=840 warm_max=120 min=120 avg=127

I guess, warm_max and min are one tick of the system timer. And it seems to me 
that one tick of the system timer is the lower limit of the irq latency by HW 
design.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov


On 11.12.18 21:29, Stefano Stabellini wrote:

Yes, I think the uart driver could be sufficient, but it has only the
Xilinx uart, the pl011 and the Xen emergency console. If I recall
correctly, Renesas needs a different driver. Any platform specific
initialization would also need to be added to it.

Actually the console driver (putchar) is really trivial in TBM, and for 
platform initialization I rely on u-boot's remainings.
But I faced a strange issue with a timer interrupt. Despite the fact the TBM 
sets MMU, exception handlers table and VBAR, the interrupt does not cause TBMs 
code being called. But I see the interrupt fired and become active in GIC 
registers.
Digging into that now.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Dario,

On 11.12.18 18:56, Dario Faggioli wrote:

Also, what about Xen numbers, sched=null.

Didn't check, will put on the list.


I don't expect much improvement, considering pinning is in-place
already.

Actually, I faced a strange issue with explicit pinning of Dom0. Didn't sort 
out the cause yet. And Julien says it is not reproducible on his desk.
But yes, with VCPU number less than PCPUs - there is no migration of Dom0 VCPUs.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-12 Thread Andrii Anisov

Hello Julien,

On 11.12.18 14:27, Julien Grall wrote:

I would like to have performance per patch so we can make the decisions whether 
the implementation cost is worth it for upstream.

I'll check baremetal numbers first. Then will get numbers per patch.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-11 Thread Stefano Stabellini
On Tue, 11 Dec 2018, Julien Grall wrote:
> On 11/12/2018 18:39, Stefano Stabellini wrote:
> > On Tue, 11 Dec 2018, Julien Grall wrote:
> > > On 10/12/2018 12:23, Andrii Anisov wrote:
> > > > Hello Julien,
> > > > 
> > > > On 10.12.18 13:54, Julien Grall wrote:
> > > > > What are the numbers without Xen?
> > > > Good question. Didn't try. At least putchar should be implemented for
> > > > that.
> > > 
> > > I think we need the baremetal numbers to be able to compare properly the
> > > old
> > > and new vGIC.
> > 
> > That might prove very hard for Andrii to do because TBM is made to run
> > on Xilinx hardware and Xen VMs only. It is probably lacking necessary
> > drivers to run on other boards natively.
> 
> Really? What sort of platform specific driver do you need? Shouldn't the UART 
> be sufficient?

Yes, I think the uart driver could be sufficient, but it has only the
Xilinx uart, the pl011 and the Xen emergency console. If I recall
correctly, Renesas needs a different driver. Any platform specific
initialization would also need to be added to it.


> When you speak about interrupt latency, you need to compare to baremetal.
> Otherwise it has no meaning at all. So what is your solution?

When I used it, I ran on Xilinx hardware, that was my solution :-D

Andrii would have to port a uart driver to it. Maybe the early_printk
trivial driver could be easy enough to port.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-11 Thread Julien Grall



On 11/12/2018 18:39, Stefano Stabellini wrote:

On Tue, 11 Dec 2018, Julien Grall wrote:

On 10/12/2018 12:23, Andrii Anisov wrote:

Hello Julien,

On 10.12.18 13:54, Julien Grall wrote:

What are the numbers without Xen?

Good question. Didn't try. At least putchar should be implemented for that.


I think we need the baremetal numbers to be able to compare properly the old
and new vGIC.


That might prove very hard for Andrii to do because TBM is made to run
on Xilinx hardware and Xen VMs only. It is probably lacking necessary
drivers to run on other boards natively.


Really? What sort of platform specific driver do you need? Shouldn't the UART be 
sufficient?


When you speak about interrupt latency, you need to compare to baremetal.
Otherwise it has no meaning at all. So what is your solution?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-11 Thread Stefano Stabellini
On Tue, 11 Dec 2018, Julien Grall wrote:
> On 10/12/2018 12:23, Andrii Anisov wrote:
> > Hello Julien,
> > 
> > On 10.12.18 13:54, Julien Grall wrote:
> > > What are the numbers without Xen?
> > Good question. Didn't try. At least putchar should be implemented for that.
> 
> I think we need the baremetal numbers to be able to compare properly the old
> and new vGIC.

That might prove very hard for Andrii to do because TBM is made to run
on Xilinx hardware and Xen VMs only. It is probably lacking necessary
drivers to run on other boards natively.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-11 Thread Dario Faggioli
On Tue, 2018-12-11 at 12:27 +, Julien Grall wrote:
> On 10/12/2018 12:23, Andrii Anisov wrote:
> > On 10.12.18 13:54, Julien Grall wrote:
> > > What are the numbers without Xen?
> > Good question. Didn't try. At least putchar should be implemented
> > for that.
> 
> I think we need the baremetal numbers to be able to compare properly
> the old and 
> new vGIC.
> 
Agreed.

Also, what about Xen numbers, sched=null.

I don't expect much improvement, considering pinning is in-place
already. Still...

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-11 Thread Julien Grall



On 10/12/2018 12:23, Andrii Anisov wrote:

Hello Julien,

On 10.12.18 13:54, Julien Grall wrote:

What are the numbers without Xen?

Good question. Didn't try. At least putchar should be implemented for that.


I think we need the baremetal numbers to be able to compare properly the old and 
new vGIC.





Which version of Xen are you using?

This morning's staging, commit-id 58eb90a9650a8ea73533bc2b87c13b8ca7bbe35a.


This also tells you that in the trap case the vGIC is not the bigger overhead.
Indeed, not the bigger. But significant even in this trivial case (receiving an 
interrupt twice a second).


To confirm, in your use-case you have the interrupt firing every 500ms, right?

But I am not sure what you are trying to argue here... I never said it was 
insignificant, I only pointed out that the context switch/trap has a strong 
impact. This means that focusing on optimizing context/switch is probably more 
worth it at the moment than trying to micro-optimizing the vGIC.


What matters at the end is the overhead of virtualization (i.e Xen). Without 
those baremetal numbers, it is quite difficult to make an idea whether this is 
significant.





This is with all your series applied but [4], correct?

Right.


Did you try to see the perfomance improvement patch by patch?

No. Not yet.


I would like to have performance per patch so we can make the decisions whether 
the implementation cost is worth it for upstream.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-10 Thread Andrii Anisov

Hello Julien,

On 10.12.18 13:54, Julien Grall wrote:

What are the numbers without Xen?

Good question. Didn't try. At least putchar should be implemented for that.


Which version of Xen are you using?

This morning's staging, commit-id 58eb90a9650a8ea73533bc2b87c13b8ca7bbe35a.


This also tells you that in the trap case the vGIC is not the bigger overhead.

Indeed, not the bigger. But significant even in this trivial case (receiving an 
interrupt twice a second).


This is with all your series applied but [4], correct?

Right.


Did you try to see the perfomance improvement patch by patch?

No. Not yet.

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-10 Thread Julien Grall
(sorry for the formatting)

On Mon, 10 Dec 2018, 12:00 Andrii Anisov,  wrote:

> Hello All,
>
> On 27.11.18 23:27, Stefano Stabellini wrote:
> > See the following:
> >
> > https://marc.info/?l=xen-devel=148668817704668
> So I did port that stuff to the current staging [1].
> Also, the correspondent tbm, itself is here [2].
> Having 4 big cores on my SoC I run XEN with the following command line:
>
>  dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2
> bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none
> guest_loglvl=all/none
>
> The TBM's domain configuration file is as following:
>
>  seclabel='system_u:system_r:domU_t'
>  name = "DomP"
>  kernel = "/home/root/ctest-bare.bin"
>  extra = "console=hvc0 rw"
>  memory = 128
>  vcpus = 1
>  cpus = "3"
>
> This gives me setup where Domain-0 runs on cores 0 and 1 solely and TBM
> runs exclusively on core 3. So that we can rely that it shows us a pure IRQ
> latency of hypervisor.
> My board is Renesas Salvator-X with H3 ES3.0 SoC and 8GB RAM. Generic
> timer runs at 8.333 MHz freq, what gives my 120ns resolution for
> measurements.
> XEN hypervisor is build without debug and TBM does wfi in the idle loop
> for all experiments.
> With that setup IRQ latency numbers are (in ns):
>

What are the numbers without Xen? Which version of Xen are you using?



> Old vgic:
>  AVG MIN MAX WARM MAX
> credit, vwfi=trap   7706756094808400
> credit, vwfi=native 2908288031204800
> credit2, vwfi=trap  7221720092407440
> credit2, vwfi=native2906288031205040
>
> New vgic:
>  AVG MIN MAX WARM MAX
> credit, vwfi=trap   8481804010200   8880
> credit, vwfi=native 4115396048004200
> credit2, vwfi=trap  8425840096009000
> credit2, vwfi=native4227396050404680
>
> Here we can see that the new vgic underperforms the old one in a trivial
> use-case modeled with TBM.
>

The vwfi=trap does not look so bad (10%) but indeed the vwfi=native adds a
bigger overhead.
This also tells you that in the trap case the vGIC is not the bigger
overhead.

I am pretty sure that this can be optimized because we mostly focused on
reliability and specification compliance for the first draft.

So yes the old vGIC performs better but at the price of unreliability and
non-compliance.


> Old vgic with optimizations [3] (without [4], because it breaks the setup):
>  AVG MIN MAX WARM MAX
> credit, vwfi=trap   7309708087607680
> credit, vwfi=native 3007300043203120
> credit2, vwfi=trap  6877672088807200
> credit2, vwfi=native2680264044402880
>

This is with all your series applied but [4], correct? Did you try to see
the perfomance improvement patch by patch?

Cheers



>
>
> [1] https://github.com/aanisov/xen/tree/4tbm
> [2] https://github.com/aanisov/tbm/commits/4xen
> [3]
> https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
> [4]
> https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03288.html
>
> --
> Sincerely,
> Andrii Anisov.
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-12-10 Thread Andrii Anisov

Hello All,

On 27.11.18 23:27, Stefano Stabellini wrote:

See the following:

https://marc.info/?l=xen-devel=148668817704668

So I did port that stuff to the current staging [1].
Also, the correspondent tbm, itself is here [2].
Having 4 big cores on my SoC I run XEN with the following command line:

dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 
loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none guest_loglvl=all/none

The TBM's domain configuration file is as following:

seclabel='system_u:system_r:domU_t'
name = "DomP"
kernel = "/home/root/ctest-bare.bin"
extra = "console=hvc0 rw"
memory = 128
vcpus = 1
cpus = "3"

This gives me setup where Domain-0 runs on cores 0 and 1 solely and TBM runs 
exclusively on core 3. So that we can rely that it shows us a pure IRQ latency 
of hypervisor.
My board is Renesas Salvator-X with H3 ES3.0 SoC and 8GB RAM. Generic timer 
runs at 8.333 MHz freq, what gives my 120ns resolution for measurements.
XEN hypervisor is build without debug and TBM does wfi in the idle loop for all 
experiments.
With that setup IRQ latency numbers are (in ns):

Old vgic:
AVG MIN MAX WARM MAX
credit, vwfi=trap   7706756094808400
credit, vwfi=native 2908288031204800
credit2, vwfi=trap  7221720092407440
credit2, vwfi=native2906288031205040

New vgic:
AVG MIN MAX WARM MAX
credit, vwfi=trap   8481804010200   8880
credit, vwfi=native 4115396048004200
credit2, vwfi=trap  8425840096009000
credit2, vwfi=native4227396050404680

Here we can see that the new vgic underperforms the old one in a trivial 
use-case modeled with TBM.

Old vgic with optimizations [3] (without [4], because it breaks the setup):
AVG MIN MAX WARM MAX
credit, vwfi=trap   7309708087607680
credit, vwfi=native 3007300043203120
credit2, vwfi=trap  6877672088807200
credit2, vwfi=native2680264044402880



[1] https://github.com/aanisov/xen/tree/4tbm
[2] https://github.com/aanisov/tbm/commits/4xen
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03288.html

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-29 Thread Andrii Anisov

Hello Stefano,

On 27.11.18 23:27, Stefano Stabellini wrote:

Hi Andrii,

See the following:

https://marc.info/?l=xen-devel=148668817704668
Thank you for the point. I remember this email, but missed it also gives 
details to setup the experiment. It looks that bare-metal app is not SoC 
specific, so going to take it in use.



The numbers have improved now thanks to vwfi=native and other
optimizations but the mechanism to setup the experiment are the same.I know 
about `vwfi=native` but it does not fit our requirements :(


--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-27 Thread Stefano Stabellini
On Tue, 20 Nov 2018, Andrii Anisov wrote:
> Hello Stefano,
> 
> On 01.11.18 22:20, Stefano Stabellini wrote:
> > No, I haven't had any time. Aside from the Xen version, another
> > difference is the interrupt source. I used the physical timer for
> > testing.
> 
> Could you share your approach for interrupts latency measurement? Are you
> using any HW specifics or it is SoC independent?
> 
> I would like to get more evidences for optimizations of gic/vgic/gic-v2 code I
> did for our customer (its about old vgic, we are still on xen 4.10).

Hi Andrii,

See the following:

https://marc.info/?l=xen-devel=148668817704668

The numbers have improved now thanks to vwfi=native and other
optimizations but the mechanism to setup the experiment are the same.

Cheers,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-20 Thread Andrii Anisov

Hello Stefano,

On 01.11.18 22:20, Stefano Stabellini wrote:

No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.


Could you share your approach for interrupts latency measurement? Are 
you using any HW specifics or it is SoC independent?


I would like to get more evidences for optimizations of gic/vgic/gic-v2 
code I did for our customer (its about old vgic, we are still on xen 4.10).


--
Sincerely,
Andrii Anisov.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-07 Thread Julien Grall

Hi Dario,

On 09/10/2018 17:46, Dario Faggioli wrote:

On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote:

Hi,


Hi Milan,


I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
In bm application I:
- start triple timer counter (ttc) which generates
interrupt every 1us
- turn on PS LED
- call function 100 times in for loop (function that sets
some values)
- turn off LED
- stop triple timer counter
- reset counter value


Ok, I'm adding Stefano, Julien, and a couple of other people interested
in RT/lowlat on Xen.


I ran this bare-metal application under Xen Hypervisor with following
settings:
 - used null scheduler (sched=null) and vwfi=native
 - bare-metal application have one vCPU and it is pinned for pCPU1
 - domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.


So, this is probably me not being familiar with Xen on Xilinx (and with
Xen on ARM as a whole), but there's a few things I'm not sure I
understand:
- you say you use sched=null _and_ pinning? That should not be
   necessary (although, it shouldn't hurt either)
- "domain which is PetaLinux", is that dom0?

IAC, if it's not terrible hard to run this kind of test, I'd say, try
without 'vwfi=native', and also with another scheduler, like Credit,
(but then do make sure you use pinning).


When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.


Here, when you say "without Xen", do you also mean without any
baremetal OS at all?


I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?


Right. So, I'm not sure I've understood the test scenario either. But
yeah, 3us jitter seems significant. Still, if we're comparing with
bare-hw, without even an OS at all, I think it could have been expected
for latency and jitter to be higher in the Xen case.

Anyway, I am not sure anyone has done a kind of analysis that could
help us identify accurately from where things like that come, and in
what proportions.

It would be really awesome to have something like that, so do go ahead
if you feel like it. :-)

I think tracing could help a little (although we don't have a super-
sophisticated tracing infrastructure like Linux's perf and such), but
sadly enough, that's still not available on ARM, I think. :-/


FWIW, I just posted a series to add xentrace support on Arm. Hopefully we can 
get this merged for Xen 4.12.


Cheers,

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg00563.html



Regards,
Dario



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-01 Thread Julien Grall

Hi Stefano,

On 11/1/18 8:20 PM, Stefano Stabellini wrote:

On Wed, 31 Oct 2018, Julien Grall wrote:

On 10/31/18 8:35 PM, Milan Boberic wrote:

Hi,


Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.


All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9,  etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
Here is link to it:
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9


This branch is quite ahead of the branch Stefano's used. There are 94 commits
more just for Arm specific code.

What I am interested is to see if we are able to reproduce Stefano's number
with the same branch. So we can have a clue whether there are a slow down
introduce in new code.

Stefano, you mention you will look at reproducing the numbers. Do you have any
update on this?


No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.


I would be actually surprised that the interrupt latency varies with 
virtualization depending on the interrupts...


If that were the case, then doing the latency on the physical interrupt 
(unlikely going to be used by virtualized guest) was quite pointless.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-11-01 Thread Stefano Stabellini
On Wed, 31 Oct 2018, Julien Grall wrote:
> On 10/31/18 8:35 PM, Milan Boberic wrote:
> > Hi,
> > 
> > > Interesting. Could you confirm the commit you were using (or the point
> > > release)?
> > > Stefano's number were based on commit "fuzz: update README.afl example"
> > > 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
> > > of Xen.
> > 
> > All Xens I used are from Xilinx git repository because I have
> > UltraZed-EG board which has Zynq UltraScale SoC.
> > Under branches you can find Xen 4.8, 4.9,  etc.
> > I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
> > Here is link to it:
> > https://github.com/Xilinx/xen/tree/xilinx/stable-4.9
> 
> This branch is quite ahead of the branch Stefano's used. There are 94 commits
> more just for Arm specific code.
> 
> What I am interested is to see if we are able to reproduce Stefano's number
> with the same branch. So we can have a clue whether there are a slow down
> introduce in new code.
> 
> Stefano, you mention you will look at reproducing the numbers. Do you have any
> update on this?

No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-31 Thread Julien Grall



On 10/31/18 8:35 PM, Milan Boberic wrote:

Hi,


Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.


All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9,  etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
Here is link to it:
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9


This branch is quite ahead of the branch Stefano's used. There are 94 
commits more just for Arm specific code.


What I am interested is to see if we are able to reproduce Stefano's 
number with the same branch. So we can have a clue whether there are a 
slow down introduce in new code.


Stefano, you mention you will look at reproducing the numbers. Do you 
have any update on this?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-31 Thread Milan Boberic
Hi,

> Interesting. Could you confirm the commit you were using (or the point
> release)?
> Stefano's number were based on commit "fuzz: update README.afl example"
> 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
> of Xen.

All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9,  etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
Here is link to it:
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9

Best regards.

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-31 Thread Julien Grall

Hi Milan,

On 10/29/18 12:29 PM, Milan Boberic wrote:

Sorry for late reply,


Don't worry, thank you for the testing and sending the .config.




I am afraid no. .config is generated during building time. So can you
paste here please.



".config" file is in attachment.

I also tried Xen 4.9 and I got almost same numbers, jitter is smaller
by 150ns which isn't significant change at all.


Interesting. Could you confirm the commit you were using (or the point 
release)?


Stefano's number were based on commit "fuzz: update README.afl example" 
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version 
of Xen.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-29 Thread Milan Boberic
Sorry for late reply,

> I am afraid no. .config is generated during building time. So can you
> paste here please.


".config" file is in attachment.

I also tried Xen 4.9 and I got almost same numbers, jitter is smaller
by 150ns which isn't significant change at all.

Milan
#
# Automatically generated file; DO NOT EDIT.
# Xen/arm 4.11.1-pre Configuration
#
CONFIG_64BIT=y
CONFIG_ARM_64=y
CONFIG_ARM=y
CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig"

#
# Architecture Features
#
CONFIG_NR_CPUS=128
# CONFIG_ACPI is not set
CONFIG_GICV3=y
# CONFIG_HAS_ITS is not set
# CONFIG_NEW_VGIC is not set
CONFIG_SBSA_VUART_CONSOLE=y

#
# ARM errata workaround via the alternative framework
#
CONFIG_ARM64_ERRATUM_827319=y
CONFIG_ARM64_ERRATUM_824069=y
CONFIG_ARM64_ERRATUM_819472=y
CONFIG_ARM64_ERRATUM_832075=y
CONFIG_ARM64_ERRATUM_834220=y
CONFIG_HARDEN_BRANCH_PREDICTOR=y
CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y
CONFIG_ALL_PLAT=y
# CONFIG_QEMU is not set
# CONFIG_RCAR3 is not set
# CONFIG_MPSOC is not set
CONFIG_ALL64_PLAT=y
# CONFIG_ALL32_PLAT is not set
CONFIG_MPSOC_PLATFORM=y

#
# Common Features
#
CONFIG_HAS_ALTERNATIVE=y
CONFIG_HAS_DEVICE_TREE=y
# CONFIG_MEM_ACCESS is not set
CONFIG_HAS_PDX=y
CONFIG_TMEM=y
# CONFIG_XSM is not set

#
# Schedulers
#
CONFIG_SCHED_CREDIT=y
CONFIG_SCHED_CREDIT2=y
CONFIG_SCHED_RTDS=y
# CONFIG_SCHED_ARINC653 is not set
CONFIG_SCHED_NULL=y
CONFIG_SCHED_CREDIT_DEFAULT=y
# CONFIG_SCHED_CREDIT2_DEFAULT is not set
# CONFIG_SCHED_RTDS_DEFAULT is not set
# CONFIG_SCHED_NULL_DEFAULT is not set
CONFIG_SCHED_DEFAULT="credit"
# CONFIG_LIVEPATCH is not set
CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS=y
CONFIG_CMDLINE=""

#
# Device Drivers
#
CONFIG_HAS_NS16550=y
CONFIG_HAS_CADENCE_UART=y
CONFIG_HAS_MVEBU=y
CONFIG_HAS_PL011=y
CONFIG_HAS_SCIF=y
CONFIG_HAS_PASSTHROUGH=y
CONFIG_ARM_SMMU=y
CONFIG_VIDEO=y
CONFIG_HAS_ARM_HDLCD=y
CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG"

#
# Debugging Options
#
# CONFIG_DEBUG is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_COVERAGE is not set
# CONFIG_LOCK_PROFILE is not set
# CONFIG_PERF_COUNTERS is not set
# CONFIG_VERBOSE_DEBUG is not set
# CONFIG_DEVICE_TREE_DEBUG is not set
# CONFIG_SCRUB_DEBUG is not set___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-26 Thread Stefano Stabellini
On Fri, 26 Oct 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/25/18 5:15 PM, Stefano Stabellini wrote:
> > On Thu, 25 Oct 2018, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 10/24/18 1:24 AM, Stefano Stabellini wrote:
> > > > On Tue, 23 Oct 2018, Milan Boberic wrote:
> > > > I don't have any other things to suggest right now. You should be able
> > > > to measure an overall 2.5us IRQ latency (if the interrupt rate is not
> > > > too high).
> > > 
> > > Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming
> > > from
> > > the blog post [1] which is based on Xen 4.9?
> > > 
> > > If the latter, then I can't rule out we may have introduce a slowdown for
> > > good
> > > or bad reason...
> > > 
> > > To rule out this possibility, I would recommend to try and reproduce the
> > > same
> > > number on Xen 4.9 and then try with Xen 4.11.
> > > 
> > > Cheers,
> > > 
> > > [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/
> > 
> > I was talking about the old numbers from Xen 4.9. You are right, we
> > cannot rule out the possibility that we introduced a slowdown.
> 
> Can you try to reproduce those number with your setup on Xen 4.11?

Yes, I intend to, it is on my TODO.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-26 Thread Julien Grall

Hi Stefano,

On 10/25/18 5:15 PM, Stefano Stabellini wrote:

On Thu, 25 Oct 2018, Julien Grall wrote:

Hi Stefano,

On 10/24/18 1:24 AM, Stefano Stabellini wrote:

On Tue, 23 Oct 2018, Milan Boberic wrote:
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).


Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from
the blog post [1] which is based on Xen 4.9?

If the latter, then I can't rule out we may have introduce a slowdown for good
or bad reason...

To rule out this possibility, I would recommend to try and reproduce the same
number on Xen 4.9 and then try with Xen 4.11.

Cheers,

[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/


I was talking about the old numbers from Xen 4.9. You are right, we
cannot rule out the possibility that we introduced a slowdown.


Can you try to reproduce those number with your setup on Xen 4.11?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Stefano Stabellini
On Thu, 25 Oct 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/24/18 1:24 AM, Stefano Stabellini wrote:
> > On Tue, 23 Oct 2018, Milan Boberic wrote:
> > I don't have any other things to suggest right now. You should be able
> > to measure an overall 2.5us IRQ latency (if the interrupt rate is not
> > too high).
> 
> Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from
> the blog post [1] which is based on Xen 4.9?
> 
> If the latter, then I can't rule out we may have introduce a slowdown for good
> or bad reason...
> 
> To rule out this possibility, I would recommend to try and reproduce the same
> number on Xen 4.9 and then try with Xen 4.11.
> 
> Cheers,
> 
> [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/

I was talking about the old numbers from Xen 4.9. You are right, we
cannot rule out the possibility that we introduced a slowdown.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Julien Grall



On 10/25/18 3:47 PM, Milan Boberic wrote:

I was asking the Xen configuration (xen/.config) to know what you have
enabled in Xen.


Oh, sorry, because I'm building xen from git repository here is the
link to it where you can check the file you mentioned.

https://github.com/Xilinx/xen/tree/xilinx/versal/xen


I am afraid no. .config is generated during building time. So can you 
paste here please.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Milan Boberic
> I was asking the Xen configuration (xen/.config) to know what you have
> enabled in Xen.

Oh, sorry, because I'm building xen from git repository here is the
link to it where you can check the file you mentioned.

https://github.com/Xilinx/xen/tree/xilinx/versal/xen


> It might, OTOH, be wise to turn it on when investigating the system
> behavior (but that's a general remark, I don't know to what Julien was
> referring to in this specific case).

I will definitely try to enable DEBUG.

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Julien Grall



On 10/25/18 1:36 PM, Milan Boberic wrote:

On Thu, Oct 25, 2018 at 1:30 PM Julien Grall  wrote:


Hi Milan,


Hi Julien,


Sorry if it was already asked. Can you provide your .config for your
test?


Yes of course, bare-metal's .cfg file is in it's in attachment (if
that is what you asked :) ).
I was asking the Xen configuration (xen/.config) to know what you have 
enabled in Xen.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Julien Grall

Hi Dario,

On 10/25/18 2:44 PM, Dario Faggioli wrote:

On Thu, 2018-10-25 at 14:36 +0200, Milan Boberic wrote:

On Thu, Oct 25, 2018 at 1:30 PM Julien Grall 
wrote:

  Do you have DEBUG enabled?


I'm not sure where exactly should I disable it. If you check line 18
in xl dmesg file in attachment it says debug=n, it's output of xl
dmesg. I'm not sure if that is the DEBUG you are talking about.


Yes, this mean debug is *not* enabled. Which is the correct setup for
doing performance/latency evaluation.

It might, OTOH, be wise to turn it on when investigating the system
behavior (but that's a general remark, I don't know to what Julien was
referring to in this specific case).


The narrow down the discrepancies during the measurement, I wanted to 
check whether Milan were doing the performance measurement with debug 
enabled.


Now I can tick off DEBUG been a potential cause of the latency/performance.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Dario Faggioli
On Thu, 2018-10-25 at 14:36 +0200, Milan Boberic wrote:
> On Thu, Oct 25, 2018 at 1:30 PM Julien Grall 
> wrote:
> >  Do you have DEBUG enabled?
> 
> I'm not sure where exactly should I disable it. If you check line 18
> in xl dmesg file in attachment it says debug=n, it's output of xl
> dmesg. I'm not sure if that is the DEBUG you are talking about.
>
Yes, this mean debug is *not* enabled. Which is the correct setup for
doing performance/latency evaluation.

It might, OTOH, be wise to turn it on when investigating the system
behavior (but that's a general remark, I don't know to what Julien was
referring to in this specific case).

To turn it on, in a recent enough Xen, which I think is what you're
using, you can use Kconfig (e.g., `make -C xen/ menuconfig').

> Also if I add prints somewhere in the code, I can see them, does that
> mean that DEBUG is enabled? If yes, can you tell me where exactly
> should I disable it?
> 
It depends on the "print". If you add 'printk("bla");', it is correct
that you see "bla" in the log, even with debug=n.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Milan Boberic
On Thu, Oct 25, 2018 at 1:30 PM Julien Grall  wrote:
>
> Hi Milan,

Hi Julien,

> Sorry if it was already asked. Can you provide your .config for your
> test?

Yes of course, bare-metal's .cfg file is in it's in attachment (if
that is what you asked :) ).

>  Do you have DEBUG enabled?

I'm not sure where exactly should I disable it. If you check line 18
in xl dmesg file in attachment it says debug=n, it's output of xl
dmesg. I'm not sure if that is the DEBUG you are talking about.
Also if I add prints somewhere in the code, I can see them, does that
mean that DEBUG is enabled? If yes, can you tell me where exactly
should I disable it?

Thanks in advance!

Milan
name = "test"
kernel = "only_timer.bin"
memory = 8
vcpus = 1
cpus = [1]
irqs = [ 48, 54, 68, 69, 70 ]
iomem = [ "0xff010,1", "0xff110,1", "0xff120,1", "0xff130,1", "0xff140,1", 
"0xff0a0,1" ](XEN) Checking for initrd in /chosen
(XEN) Initrd 02bd7000-05fffd6d
(XEN) RAM:  - 7fef
(XEN)
(XEN) MODULE[0]: 07ff4000 - 07ffc080 Device Tree
(XEN) MODULE[1]: 02bd7000 - 05fffd6d Ramdisk
(XEN) MODULE[2]: 0008 - 0318 Kernel
(XEN)  RESVD[0]: 07ff4000 - 07ffc000
(XEN)  RESVD[1]: 02bd7000 - 05fffd6d
(XEN)
(XEN) Command line: console=dtuart dtuart=serial0 dom0_mem=1024M bootscrub=0 
dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null vwfi=native 
serrors=panic
(XEN) Placing Xen at 0x7fc0-0x7fe0
(XEN) Update BOOTMOD_XEN from 0600-06108d81 => 
7fc0-7fd08d81
(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) Looking for dtuart at "serial0", options ""
 Xen 4.11.1-pre
(XEN) Xen version 4.11.1-pre (milan@) (aarch64-xilinx-linux-gcc (GCC) 7.2.0) 
debug=n  Wed Oct 24 10:11:47 CEST 2018
(XEN) Latest ChangeSet: Mon Sep 24 16:07:33 2018 -0700 git:8610a91abc-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features:  
(XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN) Extensions: FloatingPoint AdvancedSIMD
(XEN)   Debug Features: 10305106 
(XEN)   Auxiliary Features:  
(XEN)   Memory Model Features: 1122 
(XEN)   ISA Features:  00011120 
(XEN) 32-bit Execution:
(XEN)   Processor Features: 0131:00011011
(XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN) Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 
(XEN)   Memory Model Features: 10201105 4000 0126 02102211
(XEN)  ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 9 KHz
(XEN) GICv2 initialization:
(XEN) gic_dist_addr=f901
(XEN) gic_cpu_addr=f902
(XEN) gic_hyp_addr=f904
(XEN) gic_vcpu_addr=f906
(XEN) gic_maintenance_irq=25
(XEN) GICv2: Adjusting CPU interface base to 0xf902f000
(XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b).
(XEN) Using scheduler: null Scheduler (null)
(XEN) Initializing null scheduler
(XEN) WARNING: This is experimental software in development.
(XEN) Use at your own risk.
(XEN) Allocated console ring of 16 KiB.
(XEN) Bringing up CPU1
(XEN) Bringing up CPU2
(XEN) Bringing up CPU3
(XEN) Brought up 4 CPUs
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module @ 0008
(XEN) Loading ramdisk from boot module @ 02bd7000
(XEN) Allocating 1:1 mappings totalling 1024MB for dom0:
(XEN) BANK[0] 0x002000-0x006000 (1024MB)
(XEN) Grant table range: 0x007fc0-0x007fc4
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading zImage from 0008 to 2008-2318
(XEN) Loading dom0 initrd from 02bd7000 to 
0x2820-0x2b628d6d
(XEN) Loading dom0 DTB to 0x2800-0x28006e46
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to 
Xen)
(XEN) Freed 280kB init memory.
(XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER4
(XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER8
(XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER12
(XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER16
(XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER20
(XEN) 

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Julien Grall

Hi Milan,

On 10/25/18 11:09 AM, Milan Boberic wrote:

Hi,

On Wed, Oct 24, 2018 at 2:24 AM Stefano Stabellini 
 wrote:
It is good that there are no physical interrupts interrupting the cpu.
serrors=panic makes the context switch faster. I guess there are not
enough context switches to make a measurable difference.


Yes, when I did:
grep ctxt /proc/2153/status
I got:
voluntary_ctxt_switches:5
nonvoluntary_ctxt_switches: 3


I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).


This bare-metal application is the most suspicious, indeed. Still
waiting answer on Xilinx forum.


Sorry if it was already asked. Can you provide your .config for your 
test? Do you have DEBUG enabled?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Julien Grall

Hi Stefano,

On 10/24/18 1:24 AM, Stefano Stabellini wrote:

On Tue, 23 Oct 2018, Milan Boberic wrote:
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).


Is it number you measured on Xen 4.11 flavored Xilinx? Or are they 
coming from the blog post [1] which is based on Xen 4.9?


If the latter, then I can't rule out we may have introduce a slowdown 
for good or bad reason...


To rule out this possibility, I would recommend to try and reproduce the 
same number on Xen 4.9 and then try with Xen 4.11.


Cheers,

[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-25 Thread Milan Boberic
Hi,
> On Wed, Oct 24, 2018 at 2:24 AM Stefano Stabellini 
>  wrote:
> It is good that there are no physical interrupts interrupting the cpu.
> serrors=panic makes the context switch faster. I guess there are not
> enough context switches to make a measurable difference.

Yes, when I did:
grep ctxt /proc/2153/status
I got:
voluntary_ctxt_switches:5
nonvoluntary_ctxt_switches: 3

> I don't have any other things to suggest right now. You should be able
> to measure an overall 2.5us IRQ latency (if the interrupt rate is not
> too high).

This bare-metal application is the most suspicious, indeed. Still
waiting answer on Xilinx forum.

>  Just to be paranoid, we might also want to check the following, again it
> shouldn't get printed:
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 5a4f082..6cf6814 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
> unsigned int virq,
>  struct pending_irq *iter, *n;
>  unsigned long flags;
> +if ( d->domain_id != 0 && virq != 68 )
> +printk("DEBUG virq=%d local=%d\n",virq,v == current);
>  /*
>   * For edge triggered interrupts we always ignore a "falling edge".
>   * For level triggered interrupts we shouldn't, but do anyways.
Checked it again, no prints. I hoped that I will discover some vIRQs
or pIRQs slowing things down but no, no prints.
I might try something else instead of this bare-metal application
because this Xilinx SDK example is very suspicious.

Thank you for your time.

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-23 Thread Stefano Stabellini
On Tue, 23 Oct 2018, Milan Boberic wrote:
> > Just add an && irq != 1023 to the if check.
> Added it and now when I create bare-metal guest it prints only once:
> 
> (XEN) DEBUG irq=0
> (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
> root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
> the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it
> 
> 
> This part always prints only once when I create this bare-metal guest
> like I mentioned in earlier replies and we said it doesn't do any
> harm:
> 
> (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
> root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
> the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
> (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it
> 
> Now, from this patch I get:
> 
> (XEN) DEBUG irq=0
> 
> also printed only once.
> 
> Forgot to mention in reply before this one, I added serrors=panic and
> it didn't make any change, numbers are the same.
> 
> Thanks in advance!

It is good that there are no physical interrupts interrupting the cpu.
serrors=panic makes the context switch faster. I guess there are not
enough context switches to make a measurable difference.

I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).

Just to be paranoid, we might also want to check the following, again it
shouldn't get printed:

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..6cf6814 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 struct pending_irq *iter, *n;
 unsigned long flags;
 
+if ( d->domain_id != 0 && virq != 68 )
+printk("DEBUG virq=%d local=%d\n",virq,v == current);
 /*
  * For edge triggered interrupts we always ignore a "falling edge".
  * For level triggered interrupts we shouldn't, but do anyways.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-23 Thread Milan Boberic
> Just add an && irq != 1023 to the if check.
Added it and now when I create bare-metal guest it prints only once:

(XEN) DEBUG irq=0
(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it


This part always prints only once when I create this bare-metal guest
like I mentioned in earlier replies and we said it doesn't do any
harm:

(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it

Now, from this patch I get:

(XEN) DEBUG irq=0

also printed only once.

Forgot to mention in reply before this one, I added serrors=panic and
it didn't make any change, numbers are the same.

Thanks in advance!

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-22 Thread Stefano Stabellini
On Mon, 22 Oct 2018, Milan Boberic wrote:
> Hi,
> 
> > I think we want to fully understand how many other interrupts the
> > baremetal guest is receiving. To do that, we can modify my previous
> > patch to suppress any debug messages for virq=68. That way, we should
> > only see the other interrupts. Ideally there would be none.
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 5a4f082..b7a8e17 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
> > unsigned int virq,
> >  /* the irq is enabled */
> >  if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
> > +{
> >  gic_raise_guest_irq(v, virq, priority);
> > +if ( d->domain_id != 0 && virq != 68 )
> > +printk("DEBUG virq=%d local=%d\n",virq,v == current);
> > +}
> >  list_for_each_entry ( iter, >arch.vgic.inflight_irqs, inflight )
> >  {
> 
> when I apply this patch there are no prints nor debug messages in xl
> dmesg. So bare-metal receives only interrupt 68, which is good.

Yes, good!


> > Next step would be to verify that there are no other physical interrupts
> > interrupting the vcpu execution other the irq=68. We should be able to
> > check that with the following debug patch:
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index e524ad5..b34c3e4 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int 
> > is_fiq)
> >  /* Reading IRQ will ACK it */
> >  irq = gic_hw_ops->read_irq();
> > +if (current->domain->domain_id > 0 && irq != 68)
> > +{
> > +local_irq_enable();
> > +printk("DEBUG irq=%d\n",irq);
> > +local_irq_disable();
> > +}
> > +
> >  if ( likely(irq >= 16 && irq < 1020) )
> >  {
> >  local_irq_enable();
> 
> But when I apply this patch it prints forever:
> (XEN) DEBUG irq=1023
> 
> Thanks in advance!

I know why! It's because we always loop around until we read the
spurious interrupt. Just add an && irq != 1023 to the if check.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-22 Thread Milan Boberic
Hi,

> I think we want to fully understand how many other interrupts the
> baremetal guest is receiving. To do that, we can modify my previous
> patch to suppress any debug messages for virq=68. That way, we should
> only see the other interrupts. Ideally there would be none.
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 5a4f082..b7a8e17 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
> unsigned int virq,
>  /* the irq is enabled */
>  if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
> +{
>  gic_raise_guest_irq(v, virq, priority);
> +if ( d->domain_id != 0 && virq != 68 )
> +printk("DEBUG virq=%d local=%d\n",virq,v == current);
> +}
>  list_for_each_entry ( iter, >arch.vgic.inflight_irqs, inflight )
>  {

when I apply this patch there are no prints nor debug messages in xl
dmesg. So bare-metal receives only interrupt 68, which is good.

> Next step would be to verify that there are no other physical interrupts
> interrupting the vcpu execution other the irq=68. We should be able to
> check that with the following debug patch:
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e524ad5..b34c3e4 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int 
> is_fiq)
>  /* Reading IRQ will ACK it */
>  irq = gic_hw_ops->read_irq();
> +if (current->domain->domain_id > 0 && irq != 68)
> +{
> +local_irq_enable();
> +printk("DEBUG irq=%d\n",irq);
> +local_irq_disable();
> +}
> +
>  if ( likely(irq >= 16 && irq < 1020) )
>  {
>  local_irq_enable();

But when I apply this patch it prints forever:
(XEN) DEBUG irq=1023

Thanks in advance!

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-19 Thread Dario Faggioli
On Fri, 2018-10-19 at 14:02 -0700, Stefano Stabellini wrote:
> On Wed, 17 Oct 2018, Milan Boberic wrote:
> > I checked interrupt frequency with oscilloscope
> > just to be sure (toggling LED on/off when interrupts occur). So,
> > when I set:
> > - interrupts to be generated every 8 us I get jitter of 6 us
> > - interrupts to be generated every 10 us I get jitter of 3 us
> > (after
> > 2-3mins it jumps to 6 us)
> > - interrupts to be generated every 15 us jitter is the same as when
> > only bare-metal application runs on board (without Xen or any OS)
> 
> These are very interesting numbers! 
>
Indeed.

> Thanks again for running these
> experiments. I don't want to jump to conclusions but they seem to
> verify
> the theory that if the interrupt frequency is too high, we end up
> spending too much time handling interrupts, the system cannot cope,
> hence jitter increases.
> 
Yep, this makes a lot of sense.

> However, I would have thought that the threshold should be lower than
> 15us, given that it takes 2.5us to inject an interrupt. I have a
> couple
> of experiments suggestions below.
> 
FWIW, I know that numbers are always relative (hw platform, workload,
etc), and I'm happy to see that you're quite confident that we can
improve further... but these numbers seems rather good to me. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-17 Thread Milan Boberic
Hi,
>
> The device tree with everything seems to be system.dts, that was enough
> :-)  I don't need the dtsi files you used to build the final dts, I only
> need the one you use in uboot and for your guest.

 I wasn't sure so I sent everything, sorry for being bombarded with
all those files. :-)

> It looks like you set xen,passthrough correctly in system.dts for
> timer@ff11, serial@ff01, and gpio@ff0a.

Thank you for taking a look, now we are sure that passthrough works
correctly because there is no error during guest creation and there
are no prints of "DEBUG irq slow path".

> If you are not getting any errors anymore when creating your baremetal
> guest, then yes, it should be working passthrough. I would double-check
> that everything is working as expected using the DEBUG patch for Xen I
> suggested to you in the other email. You might even want to remove the
> "if" check and always print something for every interrupt of your guest
> just to get an idea of what's going on. See the attached patch.

When I apply this patch it prints forever:
(XEN) DEBUG virq=68 local=1
which is a good thing I guess because interrupts are being generated non-stop.

> Once everything is as expected I would change the frequency of the
> timer, because 1u is way too frequent. I think it should be at least
> 3us, more like 5us.

Okay, about this... I double checked my bare-metal application and
looks like interrupts weren't generated every 1 us. Maximum frequency
of interrupts is 8 us. I checked interrupt frequency with oscilloscope
just to be sure (toggling LED on/off when interrupts occur). So, when
I set:
- interrupts to be generated every 8 us I get jitter of 6 us
- interrupts to be generated every 10 us I get jitter of 3 us (after
2-3mins it jumps to 6 us)
- interrupts to be generated every 15 us jitter is the same as when
only bare-metal application runs on board (without Xen or any OS)

I want to remind you that bare-metal application that only blinks LED
with high speed gives 1 us jitter, somehow introducing frequent
interrupts causes this jitter, that's why I was unsecure about this
timer passthrough. Taking in consideration that you measured Xen
overhead of 1 us I have a feeling that I'm missing something, is there
anything else I could do to get better results except sched=null,
vwfi=native, hard vCPU pinning (1 vCPU on 1 pCPU) and passthrough (not
sure if it affects the jitter) ?
I'm forcing frequent interrupts because I'm testing to see if this
board with Xen on it could be used for real-time simulations,
real-time signal processing, etc. If I could get results like yours (1
us Xen overhead) of even better that would be great! BTW how did you
measure Xen's overhead?

> Keep in mind that jitter is about having
> deterministic IRQ latency, not about having extremely frequent
> interrupts.

Yes, but I want to see exactly where will I lose deterministic IRQ
latency which is extremely important in real-time signal processing.
So, what causes this jitter, are those Xen limits, ARM limits, etc? It
would be nice to know, I'll share all the results I get.

> I would also double check that you are not using any other devices or
> virtual interfaces in your baremetal app because that could negatively
> affect the numbers.

I checked the bare-metal app and I think there is no other devices
that bm app is using.

> Linux by default uses the virtual
> timer interface ("arm,armv8-timer", I would double check that the
> baremetal app is not doing the same -- you don't want to be using two
> timers when doing your measurements.

Hmm, I'm not sure how to check that, I could send bare-metal app if
that helps, it's created in Xilinx SDK 2017.4.
Also, should I move to Xilinx SDK 2018.2 because I'm using PetaLinux 2018.2 ?
I'm also using hardware description file for SDK that is created in
Vivado 2017.4.
Is all this could be a "not matching version" problem (I don't think
so because bm app works)?

Meng mentioned in some of his earlier posts:

> Even though the app. is the only one running on the CPU, the CPU may
> be used to handle other interrupts and its context (such as TLB and
> cache) might be flushed by other components. When these happen, the
> interrupt handling latency can vary a lot.

What do you think about this? I don't know how would I check this.

I also tried using default scheduler (removed sched=null and
vwfi=native) and jitter is 10 us when interrupt is generated every 10
us.

Thanks in advance!

Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-16 Thread Julien Grall



On 10/13/2018 05:01 PM, Milan Boberic wrote:

Hi,


Hi,




Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?

Yes they do, sorry, I reversed the order because I'm a newbie :) .


Would you mind to explain what is the triple timer counter?

On this link on page 342 is explanation.


Which link?




This is not the official Xen repository and look like patches have been applied 
on top. I am afraid, I am not going to be able help here. Could you do the same 
experiment with Xen 4.11?


I think I have to get Xen from Xilinx because I use board that has
Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it.


The board should be fully supported upstreamed. If Xilinx has more patch 
on top, then you would need to seek support from them because I don't 
know what they changed in Xen.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-16 Thread Stefano Stabellini
On Mon, 15 Oct 2018, Milan Boberic wrote:
> In attachment are device-tree files I found in my project:
> 
> device-tree.bbappend - under
> /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/
> 
> xen-overlay.dtsi , system-user.dtsi and zunqmp-qemu-arm.dts - under
> /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files
> 
> zynqmp-qemu-multiarch-arm and zynqmp-qemu-pmu - under
> /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files/multi-arch
> 
> pcw.dtsi , pl.dtsi , system-conf.dtsi , sistem-top.dts ,
> zynqmp-clk-ccf.dtsi and zynqmp.dtsi -
> under/uz3eg_iocc_2018_2/components/plnx_workspace/device-tree/device-tree/
> 
> In system-conf.dtsi file first line says:
> /*
>  * CAUTION: This file is automatically generated by PetaLinux SDK.
>  * DO NOT modify this file
>  */
> and there is no sigh of timer.
> If you could take a look at this and other files in attachment it
> would be great.

The device tree with everything seems to be system.dts, that was enough
:-)  I don't need the dtsi files you used to build the final dts, I only
need the one you use in uboot and for your guest.

In system.dts, the timers are all there:

timer@ff11 {
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff11 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
xen,passthrough;
};

timer@ff12 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x27 0x4 0x0 0x28 0x4 0x0 0x29 0x4>;
reg = <0x0 0xff12 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
};

timer@ff13 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x2a 0x4 0x0 0x2b 0x4 0x0 0x2c 0x4>;
reg = <0x0 0xff13 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3c>;
clocks = <0x3 0x1f>;
};

timer@ff14 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x2d 0x4 0x0 0x2e 0x4 0x0 0x2f 0x4>;
reg = <0x0 0xff14 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3d>;
clocks = <0x3 0x1f>;
};

It looks like you set xen,passthrough correctly in system.dts for
timer@ff11, serial@ff01, and gpio@ff0a.



> I also tried to run bare-metal app with this changes and it worked, added:
> 
>  {
> status = "okay";
> compatible = "cdns,ttc";
> interrupt-parent = <0x4>;
> interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> reg = <0x0 0xff11 0x0 0x1000>;
> timer-width = <0x20>;
> power-domains = <0x3b>;
> xen,passthrough;
> 
> };
> 
> in xen-overlay.dtsi file, because it's overlay it shouldn't duplicate
> timer nod, right?

As I wrote, system.dts looks correct.


> After build I ran:
>  dtc -I dtb -O dts -o system.dts system.dtb
> and checked for ttc0, it seems okay except interrupt-parent is <0x4>
> not <0x2> like in your example:

I don't know what you are referring to. In the system.dts you attached
interrupt-parent is <0x4> which is correct:

timer@ff11 {
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;


> timer@ff11 {
> compatible = "cdns,ttc";
> status = "okay";
> interrupt-parent = <0x4>;
> interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> reg = <0x0 0xff11 0x0 0x1000>;
> timer-width = <0x20>;
> power-domains = <0x3b>;
> clocks = <0x3 0x1f>;
> xen,passthrough;
> };
> status was "disable" before.
> system.dts is also added in attachment.

status is "okay" in the system.dts you attached. That is important
because status = "disable" it means the device cannot be used.


> Is this the working passthrough?Because jitter is the same .
> 
> When legit, working passthrough is set correctly, jitter should be
> smaller, right?

If you are not getting any errors anymore when creating your baremetal
guest, then yes, it should be working 

Re: [Xen-devel] Xen optimization

2018-10-15 Thread Julien Grall



On 15/10/2018 14:01, Milan Boberic wrote:

On 15/10/2018 09:14, Julien Grall wrote:
Which link?


I made hyperlink on "link" word, looks like somehow it got lost, here
is the link:

https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf


HTML should be avoided on the mailing list. Most of us are using 
text-only clients.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-15 Thread Milan Boberic
> On 15/10/2018 09:14, Julien Grall wrote:
> Which link?

I made hyperlink on "link" word, looks like somehow it got lost, here
is the link:

https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf

> The board should be fully supported upstreamed. If Xilinx has more patch
> on top, then you would need to seek support from them because I don't
> know what they changed in Xen.

I think Stefano can help, thanks for sugesstion.

Cheers,
Milan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-15 Thread Julien Grall

(Resending with a different address)

On 15/10/2018 09:14, Julien Grall wrote:



On 10/13/2018 05:01 PM, Milan Boberic wrote:

Hi,


Hi,




Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?

Yes they do, sorry, I reversed the order because I'm a newbie :) .


Would you mind to explain what is the triple timer counter?

On this link on page 342 is explanation.


Which link?



This is not the official Xen repository and look like patches have 
been applied on top. I am afraid, I am not going to be able help 
here. Could you do the same experiment with Xen 4.11?


I think I have to get Xen from Xilinx because I use board that has
Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it.


The board should be fully supported upstreamed. If Xilinx has more patch 
on top, then you would need to seek support from them because I don't 
know what they changed in Xen.


Cheers,



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-14 Thread Stefano Stabellini
On Sat, 13 Oct 2018, Milan Boberic wrote:
> > This is definitely wrong. Can you please also post the full host device
> > tree with your modifications that you are using for Xen and Dom0?  You
> > should have something like:
> >
> > timer@ff11 {
> > compatible = "cdns,ttc";
> > interrupt-parent = <0x2>;
> > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> > reg = <0x0 0xff11 0x0 0x1000>;
> > timer-width = <0x20>;
> > power-domains = <0x3b>;
> > xen,passthrough;
> > };
> > For each of the nodes of the devices you are assigning to the DomU.
> 
> I put
>  {
>xen,passthrough = <0x1>;
> };
> because when I was making bm app I was following this guide. Now I see
> it's wrong. When I copied directly:
> timer@ff11 {
> compatible = "cdns,ttc";
> interrupt-parent = <0x2>;
> interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> reg = <0x0 0xff11 0x0 0x1000>;
> timer-width = <0x20>;
> power-domains = <0x3b>;
> xen,passthrough;
> };
> in to the xen-overlay.dtsi file it resulted an error during
> device-tree build. I modified it a little bit so I can get successful
> build, there are all device-tree files included in attachment. I'm not
> sure how to set this passthrough properly, if you could take a look at
> those files in attachment I'd be more then grateful.
>
> > It's here: 
> > https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
> Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
> now I moved to Xen 4.11 like you suggested and applied your patch and
> Dario's also.
> 
> Okay, now when I want to xl create my domU (bare-metal app) I get error:
> 
> Parsing config from timer.cfg
> (XEN) IRQ 68 is already used by domain 0
> libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
> give domain access to irq 68: Device or resource busy
> libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
> 1:Non-existant domain
> libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
> 1:Unable to destroy guest
> libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
> 1:Destruction of domain failed

That means that the "xen,passthrough" addition to the host device tree went 
wrong.


> I guess my modifications of:
> timer@ff11 {
> compatible = "cdns,ttc";
> interrupt-parent = <0x2>;
> interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> reg = <0x0 0xff11 0x0 0x1000>;
> timer-width = <0x20>;
> power-domains = <0x3b>;
> xen,passthrough;
> };
> are not correct.

Right


> I tried to change interrupts to:
>  interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
> because if you check here on page 310 interrupts for TTC0 are 68:70.
> But that didn't work either I still get same error.

The interrupt numbers specified in the DTS are the real interrupt minus
32: 68-32 = 36 = 0x24. The DTS was correct.


> I also tried to change xen,passthrough; line with:
> xen,passthrough = <0x1>;
> but also without success, still the same error.
> 
> Are you sure about this line:
> reg = <0x0 0xff11 0x0 0x1000>;   ?
> Or it should be like this?
>  reg = <0x0 0xff11 0x1000>;

Yes, that could be a problem. The format depends on the #address-cells
and #size-cells parameters. You didn't send me system-conf.dtsi, so I
don't know for sure which one of the two is right. In any case, you
should not duplicate the timer@ff11 node in device tree. You should
only add "xen,passthrough;" to the existing timer@ff11 node, which
is probably in system-conf.dtsi. So, avoid adding a new timer node to
xen-overlay.dtsi, and instead modify system-conf.dtsi.


> I also included xl dmesg and dmesg in attachments (after xl create of bm app).
> 
> Thanks in advance!
> 
> Milan
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-13 Thread Milan Boberic
Hi,

> Don't interrupt _come_ from hardware and go/are routed to
> hypervisor/os/app?
Yes they do, sorry, I reversed the order because I'm a newbie :) .

> Would you mind to explain what is the triple timer counter?
On this link on page 342 is explanation.

> This is not the official Xen repository and look like patches have been 
> applied on top. I am afraid, I am not going to be able help here. Could you 
> do the same experiment with Xen 4.11?

I think I have to get Xen from Xilinx because I use board that has
Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it.

> This could also means that wfi is not used by the guest or you never go to 
> the idle vCPU.
Right.

> This is definitely wrong. Can you please also post the full host device
> tree with your modifications that you are using for Xen and Dom0?  You
> should have something like:
>
> timer@ff11 {
> compatible = "cdns,ttc";
> interrupt-parent = <0x2>;
> interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
> reg = <0x0 0xff11 0x0 0x1000>;
> timer-width = <0x20>;
> power-domains = <0x3b>;
> xen,passthrough;
> };
> For each of the nodes of the devices you are assigning to the DomU.

I put
 {
   xen,passthrough = <0x1>;
};
because when I was making bm app I was following this guide. Now I see
it's wrong. When I copied directly:
timer@ff11 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff11 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in to the xen-overlay.dtsi file it resulted an error during
device-tree build. I modified it a little bit so I can get successful
build, there are all device-tree files included in attachment. I'm not
sure how to set this passthrough properly, if you could take a look at
those files in attachment I'd be more then grateful.

> It's here: 
> https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
now I moved to Xen 4.11 like you suggested and applied your patch and
Dario's also.

Okay, now when I want to xl create my domU (bare-metal app) I get error:

Parsing config from timer.cfg
(XEN) IRQ 68 is already used by domain 0
libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
give domain access to irq 68: Device or resource busy
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
1:Destruction of domain failed

I guess my modifications of:
timer@ff11 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff11 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
are not correct. I tried to change interrupts to:
 interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
because if you check here on page 310 interrupts for TTC0 are 68:70.
But that didn't work either I still get same error.

I also tried to change xen,passthrough; line with:
xen,passthrough = <0x1>;
but also without success, still the same error.

Are you sure about this line:
reg = <0x0 0xff11 0x0 0x1000>;   ?
Or it should be like this?
 reg = <0x0 0xff11 0x1000>;

I also included xl dmesg and dmesg in attachments (after xl create of bm app).

Thanks in advance!

Milan
FILESEXTRAPATHS_prepend := "${THISDIR}/files:"

SRC_URI += "file://system-user.dtsi"
SRC_URI += "file://xen-overlay.dtsi"(XEN) Checking for initrd in /chosen
(XEN) Initrd 02bd7000-05fffd97
(XEN) RAM:  - 7fef
(XEN)
(XEN) MODULE[0]: 07ff4000 - 07ffc080 Device Tree
(XEN) MODULE[1]: 02bd7000 - 05fffd97 Ramdisk
(XEN) MODULE[2]: 0008 - 0318 Kernel
(XEN)  RESVD[0]: 07ff4000 - 07ffc000
(XEN)  RESVD[1]: 02bd7000 - 05fffd97
(XEN)
(XEN) Command line: console=dtuart dtuart=serial0 dom0_mem=768M bootscrub=0 
dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null vwfi=native
(XEN) Placing Xen at 0x7fc0-0x7fe0
(XEN) Update BOOTMOD_XEN from 0600-06108d81 => 
7fc0-7fd08d81
(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) Looking for dtuart at "serial0", options ""
 Xen 4.11.1-pre
(XEN) Xen version 4.11.1-pre (milan@) (aarch64-xilinx-linux-gcc (GCC) 7.2.0) 
debug=n  Sat Oct 13 16:34:51 CEST 2018
(XEN) Latest ChangeSet: Mon Sep 24 16:07:33 2018 -0700 git:8610a91abc-dirty
(XEN) 

Re: [Xen-devel] Xen optimization

2018-10-12 Thread Stefano Stabellini
On Fri, 12 Oct 2018, Milan Boberic wrote:
> Hi Stefano, glad to have you back :D,
> this is my setup:
>         - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
>         - there is only one domU and this is my bare-metal app that also have 
> one vCPU and it's pinned for pCPU1
> so yeah, there is only dom0 and bare-metal app on the board.
> 
> Jitter is the same with and without Dario's patch.
> 
> I'm still not sure about timer's passthrough because there is no mention of 
> triple timer counter is device tree so I added:
> 
>  {
>    xen,passthrough = <0x1>;
> };
> 
> at the end of the xen-overlay.dtsi file which I included in attachment.

This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0?  You
should have something like:


timer@ff11 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff11 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};

For each of the nodes of the devices you are assigning to the DomU.


> About patch you sent, I can't find this funcion void vgic_inject_irq in 
> /xen/arch/arm/vgic.c file, this is link of git repository
> from where I build my xen so you can take a look if that printk can be put 
> somewhere else.
> 
> https://github.com/Xilinx/xen/

It's here: 
https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462

BTW you are using a pretty old branch, I suggest you moving to:

https://github.com/Xilinx/xen/tree/xilinx/versal/xen/arch/arm

It will work on your board too and it is based on the much newer Xen
4.11.


> I ran some more testing and realized that results are the same with or 
> without vwfi=native, which I think again points out that
> passthrough that I need to provide in device tree isn't valid.

In reality, the results are the same with and without vwfi=native only
if the baremetal app never issues any wfi instructions.


>  And of course, higher the frequency of interrupts results in higher jitter. 
> I'm still battling with Xilinx SDK and triple timer
> counter that's why I can't figure out what is the exact frequency set (I'm 
> just rising it and lowering it), I'll give my best to
> solve that ASAP because we need to know exact value of frequency set. 

Yep, that's important :-)


> 
> Thanks in advance!
> 
> Milan
>  
> 
> 
> On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini 
>  wrote:
>   On Thu, 11 Oct 2018, Milan Boberic wrote:
>   > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
>   > >
>   > > The jitter may come from Xen or the OS in dom0.
>   > > It will be useful to know what is the jitter if you run the test on 
> PetaLinux.
>   > > (It's understandable the jitter is gone without OS. It is also 
> common
>   > > that OS introduces various interferences.)
>   >
>   > Hi Meng,
>   > well... I'm using bare-metal application and I need it exclusively to
>   > be ran on one CPU as domU (guest) without OS (and I'm not sure how
>   > would I make the same app to be ran on PetaLinux dom0 :D haha).
>   > Is there a chance that PetaLinux as dom0 is creating this jitter and
>   > how? Is there a way of decreasing it?
>   >
>   > Yes, there are no prints.
>   >
>   > I'm not sure about this timer interrupt passthrough because I didn't
>   > find any example of it, in attachment I included xen-overlay.dtsi file
>   > which I edited to add passthrough, in earlier replies there are
>   > bare-metal configuration file. It would be helpful to know if those
>   > setting are correct. If they are not correct it would explain the
>   > jitter.
>   >
>   > Thanks in advance, Milan Boberic!
> 
>   Hi Milan,
> 
>   Sorry for taking so long to go back to this thread. But I am here now :)
> 
>   First, let me ask a couple of questions to understand the scenario
>   better: is there any interference from other virtual machines while you
>   measure the jitter? Or is the baremetal app the only thing actively
>   running on the board?
> 
>   Second, it would be worth double-checking that Dario's patch to fix
>   sched=null is not having unexpected side effects. I don't think so, it
>   would be worth testing with it and without it to be sure.
> 
>   I gave a look at your VM configuration. The configuration looks correct.
>   There is no dtdev settings, but given that none of the devices you are
>   assigning to the guest does any DMA, it should be OK. You want to make
>   sure that Dom0 is not trying to use those same devices -- make sure to
>   add "xen,passthrough;" to each corresponding node on the host device
>   tree.
> 
>   The error messages "No valid vCPU found" are due to the baremetal

Re: [Xen-devel] Xen optimization

2018-10-12 Thread Julien Grall
Hi,

Sorry for the formatting.

On Fri, 12 Oct 2018, 17:36 Milan Boberic,  wrote:

> Hi Stefano, glad to have you back :D,
> this is my setup:
> - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
> - there is only one domU and this is my bare-metal app that also
> have one vCPU and it's pinned for pCPU1
> so yeah, there is only dom0 and bare-metal app on the board.
>
> Jitter is the same with and without Dario's patch.
>
> I'm still not sure about timer's passthrough because there is no mention
> of triple timer counter is device tree so I added:
>
>  {
>xen,passthrough = <0x1>;
> };
>

Would you mind to explain what is the triple timer counter?



> at the end of the xen-overlay.dtsi file which I included in attachment.
>
> About patch you sent, I can't find this funcion void vgic_inject_irq in
> /xen/arch/arm/vgic.c file, this is link of git repository from where I
> build my xen so you can take a look if that printk can be put somewhere
> else.
>

There was some vGIC rework in Xen 4.11. There was also a new vGIC added
(selectable using NEW_VGIC). It might be worth to look at it.


> https://github.com/Xilinx/xen/
>

This is not the official Xen repository and look like patches have been
applied on top. I am afraid, I am not going to be able help here. Could you
do the same experiment with Xen 4.11?


>
> I ran some more testing and realized that results are the same with or
> without vwfi=native, which I think again points out that passthrough that I
> need to provide in device tree isn't valid.
>

This could also means that wfi is not used by the guest or you never go to
the idle vCPU.


>  And of course, higher the frequency of interrupts results in higher
> jitter. I'm still battling with Xilinx SDK and triple timer counter that's
> why I can't figure out what is the exact frequency set (I'm just rising it
> and lowering it), I'll give my best to solve that ASAP because we need to
> know exact value of frequency set.
>
> Thanks in advance!
>
> Milan
>
>
>
> On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini <
> stefano.stabell...@xilinx.com> wrote:
>
>> On Thu, 11 Oct 2018, Milan Boberic wrote:
>> > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
>> > >
>> > > The jitter may come from Xen or the OS in dom0.
>> > > It will be useful to know what is the jitter if you run the test on
>> PetaLinux.
>> > > (It's understandable the jitter is gone without OS. It is also common
>> > > that OS introduces various interferences.)
>> >
>> > Hi Meng,
>> > well... I'm using bare-metal application and I need it exclusively to
>> > be ran on one CPU as domU (guest) without OS (and I'm not sure how
>> > would I make the same app to be ran on PetaLinux dom0 :D haha).
>> > Is there a chance that PetaLinux as dom0 is creating this jitter and
>> > how? Is there a way of decreasing it?
>> >
>> > Yes, there are no prints.
>> >
>> > I'm not sure about this timer interrupt passthrough because I didn't
>> > find any example of it, in attachment I included xen-overlay.dtsi file
>> > which I edited to add passthrough, in earlier replies there are
>> > bare-metal configuration file. It would be helpful to know if those
>> > setting are correct. If they are not correct it would explain the
>> > jitter.
>> >
>> > Thanks in advance, Milan Boberic!
>>
>> Hi Milan,
>>
>> Sorry for taking so long to go back to this thread. But I am here now :)
>>
>> First, let me ask a couple of questions to understand the scenario
>> better: is there any interference from other virtual machines while you
>> measure the jitter? Or is the baremetal app the only thing actively
>> running on the board?
>>
>> Second, it would be worth double-checking that Dario's patch to fix
>> sched=null is not having unexpected side effects. I don't think so, it
>> would be worth testing with it and without it to be sure.
>>
>> I gave a look at your VM configuration. The configuration looks correct.
>> There is no dtdev settings, but given that none of the devices you are
>> assigning to the guest does any DMA, it should be OK. You want to make
>> sure that Dom0 is not trying to use those same devices -- make sure to
>> add "xen,passthrough;" to each corresponding node on the host device
>> tree.
>>
>> The error messages "No valid vCPU found" are due to the baremetal
>> applications trying to configure as target cpu for the interrupt cpu1
>> (the second cpu in the system), while actually only 1 vcpu is assigned
>> to the VM. Hence, only cpu0 is allowed. I don't think it should cause
>> any jitter issues, because the request is simply ignored. Just to be
>> safe, you might want to double check that the physical interrupt is
>> delivered to the right physical cpu, which would be cpu1 in your
>> configuration, the one running the only vcpu of the baremetal app. You
>> can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
>> for example:
>>
>> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
>> index 5a4f082..208fde7 

Re: [Xen-devel] Xen optimization

2018-10-12 Thread Milan Boberic
Hi Stefano, glad to have you back :D,
this is my setup:
- dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
- there is only one domU and this is my bare-metal app that also
have one vCPU and it's pinned for pCPU1
so yeah, there is only dom0 and bare-metal app on the board.

Jitter is the same with and without Dario's patch.

I'm still not sure about timer's passthrough because there is no mention of
triple timer counter is device tree so I added:

 {
   xen,passthrough = <0x1>;
};

at the end of the xen-overlay.dtsi file which I included in attachment.

About patch you sent, I can't find this funcion void vgic_inject_irq in
/xen/arch/arm/vgic.c file, this is link of git repository from where I
build my xen so you can take a look if that printk can be put somewhere
else.

https://github.com/Xilinx/xen/

I ran some more testing and realized that results are the same with or
without vwfi=native, which I think again points out that passthrough that I
need to provide in device tree isn't valid.

 And of course, higher the frequency of interrupts results in higher
jitter. I'm still battling with Xilinx SDK and triple timer counter that's
why I can't figure out what is the exact frequency set (I'm just rising it
and lowering it), I'll give my best to solve that ASAP because we need to
know exact value of frequency set.

Thanks in advance!

Milan



On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini <
stefano.stabell...@xilinx.com> wrote:

> On Thu, 11 Oct 2018, Milan Boberic wrote:
> > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
> > >
> > > The jitter may come from Xen or the OS in dom0.
> > > It will be useful to know what is the jitter if you run the test on
> PetaLinux.
> > > (It's understandable the jitter is gone without OS. It is also common
> > > that OS introduces various interferences.)
> >
> > Hi Meng,
> > well... I'm using bare-metal application and I need it exclusively to
> > be ran on one CPU as domU (guest) without OS (and I'm not sure how
> > would I make the same app to be ran on PetaLinux dom0 :D haha).
> > Is there a chance that PetaLinux as dom0 is creating this jitter and
> > how? Is there a way of decreasing it?
> >
> > Yes, there are no prints.
> >
> > I'm not sure about this timer interrupt passthrough because I didn't
> > find any example of it, in attachment I included xen-overlay.dtsi file
> > which I edited to add passthrough, in earlier replies there are
> > bare-metal configuration file. It would be helpful to know if those
> > setting are correct. If they are not correct it would explain the
> > jitter.
> >
> > Thanks in advance, Milan Boberic!
>
> Hi Milan,
>
> Sorry for taking so long to go back to this thread. But I am here now :)
>
> First, let me ask a couple of questions to understand the scenario
> better: is there any interference from other virtual machines while you
> measure the jitter? Or is the baremetal app the only thing actively
> running on the board?
>
> Second, it would be worth double-checking that Dario's patch to fix
> sched=null is not having unexpected side effects. I don't think so, it
> would be worth testing with it and without it to be sure.
>
> I gave a look at your VM configuration. The configuration looks correct.
> There is no dtdev settings, but given that none of the devices you are
> assigning to the guest does any DMA, it should be OK. You want to make
> sure that Dom0 is not trying to use those same devices -- make sure to
> add "xen,passthrough;" to each corresponding node on the host device
> tree.
>
> The error messages "No valid vCPU found" are due to the baremetal
> applications trying to configure as target cpu for the interrupt cpu1
> (the second cpu in the system), while actually only 1 vcpu is assigned
> to the VM. Hence, only cpu0 is allowed. I don't think it should cause
> any jitter issues, because the request is simply ignored. Just to be
> safe, you might want to double check that the physical interrupt is
> delivered to the right physical cpu, which would be cpu1 in your
> configuration, the one running the only vcpu of the baremetal app. You
> can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
> for example:
>
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 5a4f082..208fde7 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v,
> unsigned int virq,
>  out:
>  spin_unlock_irqrestore(>arch.vgic.lock, flags);
>
> +if (v != current) printk("DEBUG irq slow path!\n");
>  /* we have a new higher priority irq, inject it into the guest */
>  vcpu_kick(v);
>
> You don't want "DEBUG irq slow path!" to get printed.
>
> Finally, I would try to set the timer to generate events less frequently
> than every 1us and see what happens, maybe every 5-10us. In my tests,
> the IRQ latency overhead caused by Xen is around 1us, so injecting 1
> interrupt every 1us, plus 1us of latency caused 

Re: [Xen-devel] Xen optimization

2018-10-11 Thread Stefano Stabellini
On Thu, 11 Oct 2018, Milan Boberic wrote:
> On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
> >
> > The jitter may come from Xen or the OS in dom0.
> > It will be useful to know what is the jitter if you run the test on 
> > PetaLinux.
> > (It's understandable the jitter is gone without OS. It is also common
> > that OS introduces various interferences.)
> 
> Hi Meng,
> well... I'm using bare-metal application and I need it exclusively to
> be ran on one CPU as domU (guest) without OS (and I'm not sure how
> would I make the same app to be ran on PetaLinux dom0 :D haha).
> Is there a chance that PetaLinux as dom0 is creating this jitter and
> how? Is there a way of decreasing it?
> 
> Yes, there are no prints.
> 
> I'm not sure about this timer interrupt passthrough because I didn't
> find any example of it, in attachment I included xen-overlay.dtsi file
> which I edited to add passthrough, in earlier replies there are
> bare-metal configuration file. It would be helpful to know if those
> setting are correct. If they are not correct it would explain the
> jitter.
> 
> Thanks in advance, Milan Boberic!

Hi Milan,

Sorry for taking so long to go back to this thread. But I am here now :)

First, let me ask a couple of questions to understand the scenario
better: is there any interference from other virtual machines while you
measure the jitter? Or is the baremetal app the only thing actively
running on the board?

Second, it would be worth double-checking that Dario's patch to fix
sched=null is not having unexpected side effects. I don't think so, it
would be worth testing with it and without it to be sure.

I gave a look at your VM configuration. The configuration looks correct.
There is no dtdev settings, but given that none of the devices you are
assigning to the guest does any DMA, it should be OK. You want to make
sure that Dom0 is not trying to use those same devices -- make sure to
add "xen,passthrough;" to each corresponding node on the host device
tree.

The error messages "No valid vCPU found" are due to the baremetal
applications trying to configure as target cpu for the interrupt cpu1
(the second cpu in the system), while actually only 1 vcpu is assigned
to the VM. Hence, only cpu0 is allowed. I don't think it should cause
any jitter issues, because the request is simply ignored. Just to be
safe, you might want to double check that the physical interrupt is
delivered to the right physical cpu, which would be cpu1 in your
configuration, the one running the only vcpu of the baremetal app. You
can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
for example:

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..208fde7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 out:
 spin_unlock_irqrestore(>arch.vgic.lock, flags);
 
+if (v != current) printk("DEBUG irq slow path!\n");
 /* we have a new higher priority irq, inject it into the guest */
 vcpu_kick(v);
 
You don't want "DEBUG irq slow path!" to get printed.

Finally, I would try to set the timer to generate events less frequently
than every 1us and see what happens, maybe every 5-10us. In my tests,
the IRQ latency overhead caused by Xen is around 1us, so injecting 1
interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to
good results.

I hope this helps, please keep us updated with your results, they are
very interesting!

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-11 Thread Dario Faggioli
Hey,

Be a bit more careful about not top posting, please? :-)

On Thu, 2018-10-11 at 14:17 +0200, Milan Boberic wrote:
> I misunderstood the passthrough concept, it only allows guest domain
> to use certain interrupts and memory. 
>
I'm afraid we totally rely on people with much more experience than me
(and I guess Meng's) on how things work on ARM.

> Is there are way to somehow
> route interrupt from domU (bare-metal app) to hw?
>
Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-11 Thread Meng Xu
Hi Milan,

On Thu, Oct 11, 2018 at 12:36 AM Milan Boberic  wrote:
>
> On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
> >
> > The jitter may come from Xen or the OS in dom0.
> > It will be useful to know what is the jitter if you run the test on 
> > PetaLinux.
> > (It's understandable the jitter is gone without OS. It is also common
> > that OS introduces various interferences.)
>
> Hi Meng,
> well... I'm using bare-metal application and I need it exclusively to
> be ran on one CPU as domU (guest) without OS (and I'm not sure how
> would I make the same app to be ran on PetaLinux dom0 :D haha).
> Is there a chance that PetaLinux as dom0 is creating this jitter and
> how? Is there a way of decreasing it?

I'm not familiar with PetaLinux. :(
From my previous experience in measuring the rt-test in the
virtualization environment, I found:
Even though the app. is the only one running on the CPU, the CPU may
be used to handle other interrupts and its context (such as TLB and
cache) might be flushed by other components. When these happen, the
interrupt handling latency can vary a lot.

Hopefully, it helps. :)

Meng

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-11 Thread Milan Boberic
I misunderstood the passthrough concept, it only allows guest domain
to use certain interrupts and memory. Is there are way to somehow
route interrupt from domU (bare-metal app) to hw?
On Thu, Oct 11, 2018 at 9:36 AM Milan Boberic  wrote:
>
> On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
> >
> > The jitter may come from Xen or the OS in dom0.
> > It will be useful to know what is the jitter if you run the test on 
> > PetaLinux.
> > (It's understandable the jitter is gone without OS. It is also common
> > that OS introduces various interferences.)
>
> Hi Meng,
> well... I'm using bare-metal application and I need it exclusively to
> be ran on one CPU as domU (guest) without OS (and I'm not sure how
> would I make the same app to be ran on PetaLinux dom0 :D haha).
> Is there a chance that PetaLinux as dom0 is creating this jitter and
> how? Is there a way of decreasing it?
>
> Yes, there are no prints.
>
> I'm not sure about this timer interrupt passthrough because I didn't
> find any example of it, in attachment I included xen-overlay.dtsi file
> which I edited to add passthrough, in earlier replies there are
> bare-metal configuration file. It would be helpful to know if those
> setting are correct. If they are not correct it would explain the
> jitter.
>
> Thanks in advance, Milan Boberic!

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-11 Thread Milan Boberic
On Wed, Oct 10, 2018 at 6:41 PM Meng Xu  wrote:
>
> The jitter may come from Xen or the OS in dom0.
> It will be useful to know what is the jitter if you run the test on PetaLinux.
> (It's understandable the jitter is gone without OS. It is also common
> that OS introduces various interferences.)

Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?

Yes, there are no prints.

I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.

Thanks in advance, Milan Boberic!
/ {
chosen {
#address-cells = <2>;
#size-cells = <1>;

xen,xen-bootargs = "console=dtuart dtuart=serial0 dom0_mem=768M 
bootscrub=0 dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null 
vwfi=native";
xen,dom0-bootargs = "console=hvc0 earlycon=xen earlyprintk=xen 
maxcpus=1 clk_ignore_unused";

dom0 {
compatible = "xen,linux-zimage", "xen,multiboot-module";
reg = <0x0 0x8 0x310>;
};
};

};

 {
status = "okay";
mmu-masters = <  0x874
 0x875
 0x876
 0x877
_0 0x860
_1 0x861
 0x873
_dma_chan1 0x868
_dma_chan2 0x869
_dma_chan3 0x86a
_dma_chan4 0x86b
_dma_chan5 0x86c
_dma_chan6 0x86d
_dma_chan7 0x86e
_dma_chan8 0x86f
_dma_chan1 0x14e8
_dma_chan2 0x14e9
_dma_chan3 0x14ea
_dma_chan4 0x14eb
_dma_chan5 0x14ec
_dma_chan6 0x14ed
_dma_chan7 0x14ee
_dma_chan8 0x14ef
 0x870
 0x871
 0x872>;
};

 {
   xen,passthrough = <0x1>;
};

 {
   xen,passthrough = <0x1>;
};

 {
   xen,passthrough = <0x1>;
};

 {
   xen,passthrough = <0x1>;
};

 {
   xen,passthrough = <0x1>;
};

 {
   xen,passthrough = <0x1>;
};___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-10 Thread Meng Xu
[Just add some thoughts on this.]

On Wed, Oct 10, 2018 at 4:22 AM Milan Boberic  wrote:
>
> Hi,
> sorry, my explanation wasn't precise and I missed the point.
> vCPU pinning with sched=null I put "just in case", because it doesn't hurt.
>
> Yes, PetaLinux domain is dom0.


The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)

Another thing you might have already done: make sure there is no print
information from either Xen or OS during your experiment. print causes
long delay.

Meng

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen optimization

2018-10-10 Thread Milan Boberic
Attachments.
name = "test"
kernel = "timer.bin"
memory = 8
vcpus = 1
cpus = [1]
irqs = [ 48, 54, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 ]
iomem = [ "0xff010,1", "0xff110,1", "0xff120,1", "0xff130,1", "0xff140,1", 
"0xff0a0,1" ][0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.14.0-xilinx-v2018.2 (oe-user@oe-host) (gcc 
version 7.2.0 (GCC)) #1 SMP Mon Oct 1 16:41:32 CEST 2018
[0.00] Boot CPU: AArch64 Processor [410fd034]
[0.00] Machine model: xlnx,zynqmp
[0.00] Xen 4.10 support found
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] cma: Reserved 256 MiB at 0x6000
[0.00] On node 0 totalpages: 196608
[0.00]   DMA zone: 2688 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 196608 pages, LIFO batch:31
[0.00] psci: probing for conduit method from DT.
[0.00] psci: PSCIv1.1 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: Trusted OS migration not required
[0.00] random: fast init done
[0.00] percpu: Embedded 21 pages/cpu @ffc03ffb7000 s46488 r8192 
d31336 u86016
[0.00] pcpu-alloc: s46488 r8192 d31336 u86016 alloc=21*4096
[0.00] pcpu-alloc: [0] 0
[0.00] Detected VIPT I-cache on CPU0
[0.00] CPU features: enabling workaround for ARM erratum 845719
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 193920
[0.00] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen 
maxcpus=1 clk_ignore_unused
[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[0.00] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[0.00] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[0.00] Memory: 423788K/786432K available (9980K kernel code, 644K 
rwdata, 3132K rodata, 512K init, 2168K bss, 100500K reserved, 262144K 
cma-reserved)
[0.00] Virtual kernel memory layout:
[0.00] modules : 0xff80 - 0xff800800   (   128 
MB)
[0.00] vmalloc : 0xff800800 - 0xffbebfff   (   250 
GB)
[0.00]   .text : 0xff800808 - 0xff8008a4   (  9984 
KB)
[0.00] .rodata : 0xff8008a4 - 0xff8008d6   (  3200 
KB)
[0.00]   .init : 0xff8008d6 - 0xff8008de   (   512 
KB)
[0.00]   .data : 0xff8008de - 0xff8008e81200   (   645 
KB)
[0.00].bss : 0xff8008e81200 - 0xff800909f2b0   (  2169 
KB)
[0.00] fixed   : 0xffbefe7fd000 - 0xffbefec0   (  4108 
KB)
[0.00] PCI I/O : 0xffbefee0 - 0xffbeffe0   (16 
MB)
[0.00] vmemmap : 0xffbf - 0xffc0   ( 4 
GB maximum)
[0.00]   0xffbf0070 - 0xffbf0188   (17 
MB actual)
[0.00] memory  : 0xffc02000 - 0xffc07000   (  1280 
MB)
[0.00] Hierarchical RCU implementation.
[0.00]  RCU event tracing is enabled.
[0.00]  RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] arch_timer: cp15 timer(s) running at 99.99MHz (virt).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0x171015c90f, max_idle_ns: 440795203080 ns
[0.03] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 
4398046511101ns
[0.000287] Console: colour dummy device 80x25
[0.283041] console [hvc0] enabled
[0.286513] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 199.99 BogoMIPS (lpj=36)
[0.296969] pid_max: default: 32768 minimum: 301
[0.301730] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[0.308393] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[0.316319] ASID allocator initialised with 65536 entries
[0.321502] xen:grant_table: Grant tables using version 1 layout
[0.327092] Grant table initialized
[0.330637] xen:events: Using FIFO-based ABI
[0.334961] Xen: initializing cpu0
[0.338469] Hierarchical SRCU implementation.
[0.343130] EFI services will not be available.
[0.347423] zynqmp_plat_init Platform Management API v1.0
[0.352852] zynqmp_plat_init Trustzone version v1.0
[0.357828] smp: Bringing up secondary CPUs ...
[0.362366] smp: Brought up 1 node, 1 CPU
[0.366430] SMP: Total of 1 processors activated.
[0.371189] CPU features: detected feature: 32-bit EL0 Support
[0.377073] CPU: All CPU(s) started at EL1
[0.381231] alternatives: patching kernel code
[0.386133] devtmpfs: initialized
[0.392766] clocksource: jiffies: mask: 0x max_cycles: 

Re: [Xen-devel] Xen optimization

2018-10-10 Thread Milan Boberic
Hi,
sorry, my explanation wasn't precise and I missed the point.
vCPU pinning with sched=null I put "just in case", because it doesn't hurt.

Yes, PetaLinux domain is dom0.

Tested with Credit scheduler before (it was just the LED blink
application but anyway), it results with bigger jitter then null
scheduler. For example, with Credit scheduler LED blinking result in
approximately 3us jitter where with null scheduler there is no jitter.
vwfi=native was giving the domain destruction problem which you fixed
by sending me patch, approximately 2 weeks ago if you recall :) but I
still didn't test it's impact on performance, I will do it ASAP and
share results (I think that without vwfi=native jitter will be the
same or even bigger).

 When I say "without Xen", yes, I mean without any OS. Just hardware
and this bare-metal app. I do expect latency to be higher in the Xen
case and I'm curious how much exactly (which is the point of my work
and also master thesis for my faculty :D).

Now, the point is that when I set only LED blinking (without timer) in
my application there is no jitter (in Xen case) but when I add timer
which generates interrupt every us, jitter of 3 us occurs. Timer I use
is zynq ultrascale's triple timer counter. I'm suspecting that timer
interrupt is creating that jitter.

For interrupts I use passthrough in bare-metal application's
configuration file (which works for GPIO LED because there is no
jitter, interrupt can "freely go" from guest domain directly to GPIO
LED).

Also, when I create guest domain (which is this bare-metal
application) I get this messages:

(XEN) printk: 54 messages suppressed.
(XEN) d2v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
root@uz3eg-iocc-2018-2:~# (XEN) d2v0 No valid vCPU found for vIRQ34 in
the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ35 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it

In attachments I included dmesg, xl dmesg and bare-metal application's
configuration file.

Thanks in advance, Milan Boberic.






On Tue, Oct 9, 2018 at 6:46 PM Dario Faggioli  wrote:
>
> On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote:
> > Hi,
> >
> Hi Milan,
>
> > I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
> > carrier card.
> > I created bare-metal application in Xilinx SDK.
> > In bm application I:
> >- start triple timer counter (ttc) which generates
> > interrupt every 1us
> >- turn on PS LED
> >- call function 100 times in for loop (function that sets
> > some values)
> >- turn off LED
> >- stop triple timer counter
> >- reset counter value
> >
> Ok, I'm adding Stefano, Julien, and a couple of other people interested
> in RT/lowlat on Xen.
>
> > I ran this bare-metal application under Xen Hypervisor with following
> > settings:
> > - used null scheduler (sched=null) and vwfi=native
> > - bare-metal application have one vCPU and it is pinned for pCPU1
> > - domain which is PetaLinux also have one vCPU pinned for pCPU0,
> > other pCPUs are unused.
> > Under Xen Hypervisor I can see 3us jitter on oscilloscope.
> >
> So, this is probably me not being familiar with Xen on Xilinx (and with
> Xen on ARM as a whole), but there's a few things I'm not sure I
> understand:
> - you say you use sched=null _and_ pinning? That should not be
>   necessary (although, it shouldn't hurt either)
> - "domain which is PetaLinux", is that dom0?
>
> IAC, if it's not terrible hard to run this kind of test, I'd say, try
> without 'vwfi=native', and also with another scheduler, like Credit,
> (but then do make sure you use pinning).
>
> > When I ran same bm application with JTAG from Xilinx SDK (without Xen
> > Hypervisor, directly on the board) there is no jitter.
> >
> Here, when you say "without Xen", do you also mean without any
> baremetal OS at all?
>
> > I'm curios what causes this 3us jitter in Xen (which isn't small
> > jitter at all) and is there any way of decreasing it?
> >
> Right. So, I'm not sure I've understood the test scenario either. But
> yeah, 3us jitter seems significant. Still, if we're comparing with
> bare-hw, without even an OS at all, I think it could have been expected
> for latency and jitter to be higher in the Xen case.
>
> Anyway, I am not sure anyone has done a kind of analysis that could
> help us identify accurately from where things like that come, and in
> what proportions.
>
> It would be 

Re: [Xen-devel] Xen optimization

2018-10-09 Thread Dario Faggioli
On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote:
> Hi,
>
Hi Milan,

> I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
> carrier card.
> I created bare-metal application in Xilinx SDK.
> In bm application I:
>- start triple timer counter (ttc) which generates
> interrupt every 1us
>- turn on PS LED
>- call function 100 times in for loop (function that sets
> some values)
>- turn off LED
>- stop triple timer counter
>- reset counter value
> 
Ok, I'm adding Stefano, Julien, and a couple of other people interested
in RT/lowlat on Xen.

> I ran this bare-metal application under Xen Hypervisor with following
> settings:
> - used null scheduler (sched=null) and vwfi=native
> - bare-metal application have one vCPU and it is pinned for pCPU1
> - domain which is PetaLinux also have one vCPU pinned for pCPU0,
> other pCPUs are unused.
> Under Xen Hypervisor I can see 3us jitter on oscilloscope.
> 
So, this is probably me not being familiar with Xen on Xilinx (and with
Xen on ARM as a whole), but there's a few things I'm not sure I
understand:
- you say you use sched=null _and_ pinning? That should not be 
  necessary (although, it shouldn't hurt either)
- "domain which is PetaLinux", is that dom0?

IAC, if it's not terrible hard to run this kind of test, I'd say, try
without 'vwfi=native', and also with another scheduler, like Credit,
(but then do make sure you use pinning).

> When I ran same bm application with JTAG from Xilinx SDK (without Xen
> Hypervisor, directly on the board) there is no jitter.
> 
Here, when you say "without Xen", do you also mean without any
baremetal OS at all?

> I'm curios what causes this 3us jitter in Xen (which isn't small
> jitter at all) and is there any way of decreasing it?
> 
Right. So, I'm not sure I've understood the test scenario either. But
yeah, 3us jitter seems significant. Still, if we're comparing with
bare-hw, without even an OS at all, I think it could have been expected
for latency and jitter to be higher in the Xen case.

Anyway, I am not sure anyone has done a kind of analysis that could
help us identify accurately from where things like that come, and in
what proportions.

It would be really awesome to have something like that, so do go ahead
if you feel like it. :-)

I think tracing could help a little (although we don't have a super-
sophisticated tracing infrastructure like Linux's perf and such), but
sadly enough, that's still not available on ARM, I think. :-/

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen optimization

2018-10-09 Thread Milan Boberic
Hi,
I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
In bm application I:
   - start triple timer counter (ttc) which generates
interrupt every 1us
   - turn on PS LED
   - call function 100 times in for loop (function that sets
some values)
   - turn off LED
   - stop triple timer counter
   - reset counter value

I ran this bare-metal application under Xen Hypervisor with following settings:
- used null scheduler (sched=null) and vwfi=native
- bare-metal application have one vCPU and it is pinned for pCPU1
- domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.

When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.

I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?

Also I would gladly accept any suggestion about increasing
performance, decreasing jitter, decreasing interrupt latency, etc.

Thanks in advance, Milan Boberic.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel