Re: [Xen-devel] Xen optimization
On Wed, 12 Dec 2018, Andrii Anisov wrote: > Hello Stefano, > > On 12.12.18 19:39, Stefano Stabellini wrote: > > Thanks for the good work, Andrii! > > > > The WARM_MAX improvements for vwfi=native with your optimizations are > > impressive. > > I really hope you are not speaking about these numbers: > > > > max=840 warm_max=120 min=120 avg=127 > > Those are TBM baremetal numbers in hyp mode. I know, I was referring to your older results, sorry for the confusion. > Did you try my RFC on your HW? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 2018-12-12 at 19:32 +0200, Andrii Anisov wrote: > On 12.12.18 19:10, Dario Faggioli wrote: > > I think only bisection could shed some light on this. And it would > > be > > wonderful if you could do that, but I understand that it takes > > time. :- > > / > Well, bisect might help. But I'm really confused why MemTotal may be > reduced. > Yeah, and although difficult to admit/see the reason why, I think this looks like it is coming from something we do in Xen. And since you say you have an old Xen version that works, I really see bisection as the way to go... > > Are you absolutely sure about that? That is, are you "just" > > assuming > > the scheduler won't move stuff, or have you put some debugging or > > printing in place to verify that to be the case? > Being honest, I did not check for exactly this setup. I verified it > for 4.10. > Not sure I'm getting. Are you saying that you somehow verified that on 4.10 vcpus don't move? But on 4.10 you have pinning that works, don't you? Or are you saying you've verified that vcpus don't move, on 4.10, even without doing the pinning? If yes, can I ask how? As for staging, I really can't tell, as indeed there would be no need for them to move, but they actually could, for a number of reasons. So, unless you, like, put printk()-s (if you can) or ASSERTS() when v- >processor changes, I wouldn't take that for granted. :-( Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Stefano, On 12.12.18 19:39, Stefano Stabellini wrote: Thanks for the good work, Andrii! The WARM_MAX improvements for vwfi=native with your optimizations are impressive. I really hope you are not speaking about these numbers: max=840 warm_max=120 min=120 avg=127 Those are TBM baremetal numbers in hyp mode. Did you try my RFC on your HW? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 12 Dec 2018, Andrii Anisov wrote: > On 12.12.18 11:46, Andrii Anisov wrote: > > Digging into that now. > I got it. My u-boot starts TBM in hyp mode. But them both miss setting > HCR_EL2.IMO, so no interrupt exception was taken in hyp. > OK, for my baremetal TBM in hyp, numbers are: > > max=840 warm_max=120 min=120 avg=127 > > I guess, warm_max and min are one tick of the system timer. And it seems to me > that one tick of the system timer is the lower limit of the irq latency by HW > design. Thanks for the good work, Andrii! The WARM_MAX improvements for vwfi=native with your optimizations are impressive. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Dario, On 12.12.18 19:10, Dario Faggioli wrote: Ah, yes... I've seen the thread. I haven't commented, as it is really, really weird, and I don't know what to think/say. I think only bisection could shed some light on this. And it would be wonderful if you could do that, but I understand that it takes time. :- / Well, bisect might help. But I'm really confused why MemTotal may be reduced. Are you absolutely sure about that? That is, are you "just" assuming the scheduler won't move stuff, or have you put some debugging or printing in place to verify that to be the case?Being honest, I did not check for exactly this setup. I verified it for 4.10. I'm asking because, yet, in theory that is what one would expect. But, as I think you know very well, although in theory there is no difference between theory and practice, in practice, there is. :-) I know it very well :) -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 2018-12-12 at 11:39 +0200, Andrii Anisov wrote: > Hello Dario, > Hi, > On 11.12.18 18:56, Dario Faggioli wrote: > > Also, what about Xen numbers, sched=null. > Didn't check, will put on the list. > :-) > > I don't expect much improvement, considering pinning is in-place > > already. > Actually, I faced a strange issue with explicit pinning of Dom0. > Didn't sort out the cause yet. And Julien says it is not reproducible > on his desk. > Ah, yes... I've seen the thread. I haven't commented, as it is really, really weird, and I don't know what to think/say. I think only bisection could shed some light on this. And it would be wonderful if you could do that, but I understand that it takes time. :- / > But yes, with VCPU number less than PCPUs - there is no migration of > Dom0 VCPUs. > Are you absolutely sure about that? That is, are you "just" assuming the scheduler won't move stuff, or have you put some debugging or printing in place to verify that to be the case? I'm asking because, yet, in theory that is what one would expect. But, as I think you know very well, although in theory there is no difference between theory and practice, in practice, there is. :-) Regards, Dario > [1]https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 12.12.18 11:46, Andrii Anisov wrote: Digging into that now. I got it. My u-boot starts TBM in hyp mode. But them both miss setting HCR_EL2.IMO, so no interrupt exception was taken in hyp. OK, for my baremetal TBM in hyp, numbers are: max=840 warm_max=120 min=120 avg=127 I guess, warm_max and min are one tick of the system timer. And it seems to me that one tick of the system timer is the lower limit of the irq latency by HW design. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 11.12.18 21:29, Stefano Stabellini wrote: Yes, I think the uart driver could be sufficient, but it has only the Xilinx uart, the pl011 and the Xen emergency console. If I recall correctly, Renesas needs a different driver. Any platform specific initialization would also need to be added to it. Actually the console driver (putchar) is really trivial in TBM, and for platform initialization I rely on u-boot's remainings. But I faced a strange issue with a timer interrupt. Despite the fact the TBM sets MMU, exception handlers table and VBAR, the interrupt does not cause TBMs code being called. But I see the interrupt fired and become active in GIC registers. Digging into that now. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Dario, On 11.12.18 18:56, Dario Faggioli wrote: Also, what about Xen numbers, sched=null. Didn't check, will put on the list. I don't expect much improvement, considering pinning is in-place already. Actually, I faced a strange issue with explicit pinning of Dom0. Didn't sort out the cause yet. And Julien says it is not reproducible on his desk. But yes, with VCPU number less than PCPUs - there is no migration of Dom0 VCPUs. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00435.html -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Julien, On 11.12.18 14:27, Julien Grall wrote: I would like to have performance per patch so we can make the decisions whether the implementation cost is worth it for upstream. I'll check baremetal numbers first. Then will get numbers per patch. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Tue, 11 Dec 2018, Julien Grall wrote: > On 11/12/2018 18:39, Stefano Stabellini wrote: > > On Tue, 11 Dec 2018, Julien Grall wrote: > > > On 10/12/2018 12:23, Andrii Anisov wrote: > > > > Hello Julien, > > > > > > > > On 10.12.18 13:54, Julien Grall wrote: > > > > > What are the numbers without Xen? > > > > Good question. Didn't try. At least putchar should be implemented for > > > > that. > > > > > > I think we need the baremetal numbers to be able to compare properly the > > > old > > > and new vGIC. > > > > That might prove very hard for Andrii to do because TBM is made to run > > on Xilinx hardware and Xen VMs only. It is probably lacking necessary > > drivers to run on other boards natively. > > Really? What sort of platform specific driver do you need? Shouldn't the UART > be sufficient? Yes, I think the uart driver could be sufficient, but it has only the Xilinx uart, the pl011 and the Xen emergency console. If I recall correctly, Renesas needs a different driver. Any platform specific initialization would also need to be added to it. > When you speak about interrupt latency, you need to compare to baremetal. > Otherwise it has no meaning at all. So what is your solution? When I used it, I ran on Xilinx hardware, that was my solution :-D Andrii would have to port a uart driver to it. Maybe the early_printk trivial driver could be easy enough to port. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 11/12/2018 18:39, Stefano Stabellini wrote: On Tue, 11 Dec 2018, Julien Grall wrote: On 10/12/2018 12:23, Andrii Anisov wrote: Hello Julien, On 10.12.18 13:54, Julien Grall wrote: What are the numbers without Xen? Good question. Didn't try. At least putchar should be implemented for that. I think we need the baremetal numbers to be able to compare properly the old and new vGIC. That might prove very hard for Andrii to do because TBM is made to run on Xilinx hardware and Xen VMs only. It is probably lacking necessary drivers to run on other boards natively. Really? What sort of platform specific driver do you need? Shouldn't the UART be sufficient? When you speak about interrupt latency, you need to compare to baremetal. Otherwise it has no meaning at all. So what is your solution? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Tue, 11 Dec 2018, Julien Grall wrote: > On 10/12/2018 12:23, Andrii Anisov wrote: > > Hello Julien, > > > > On 10.12.18 13:54, Julien Grall wrote: > > > What are the numbers without Xen? > > Good question. Didn't try. At least putchar should be implemented for that. > > I think we need the baremetal numbers to be able to compare properly the old > and new vGIC. That might prove very hard for Andrii to do because TBM is made to run on Xilinx hardware and Xen VMs only. It is probably lacking necessary drivers to run on other boards natively. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Tue, 2018-12-11 at 12:27 +, Julien Grall wrote: > On 10/12/2018 12:23, Andrii Anisov wrote: > > On 10.12.18 13:54, Julien Grall wrote: > > > What are the numbers without Xen? > > Good question. Didn't try. At least putchar should be implemented > > for that. > > I think we need the baremetal numbers to be able to compare properly > the old and > new vGIC. > Agreed. Also, what about Xen numbers, sched=null. I don't expect much improvement, considering pinning is in-place already. Still... Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 10/12/2018 12:23, Andrii Anisov wrote: Hello Julien, On 10.12.18 13:54, Julien Grall wrote: What are the numbers without Xen? Good question. Didn't try. At least putchar should be implemented for that. I think we need the baremetal numbers to be able to compare properly the old and new vGIC. Which version of Xen are you using? This morning's staging, commit-id 58eb90a9650a8ea73533bc2b87c13b8ca7bbe35a. This also tells you that in the trap case the vGIC is not the bigger overhead. Indeed, not the bigger. But significant even in this trivial case (receiving an interrupt twice a second). To confirm, in your use-case you have the interrupt firing every 500ms, right? But I am not sure what you are trying to argue here... I never said it was insignificant, I only pointed out that the context switch/trap has a strong impact. This means that focusing on optimizing context/switch is probably more worth it at the moment than trying to micro-optimizing the vGIC. What matters at the end is the overhead of virtualization (i.e Xen). Without those baremetal numbers, it is quite difficult to make an idea whether this is significant. This is with all your series applied but [4], correct? Right. Did you try to see the perfomance improvement patch by patch? No. Not yet. I would like to have performance per patch so we can make the decisions whether the implementation cost is worth it for upstream. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Julien, On 10.12.18 13:54, Julien Grall wrote: What are the numbers without Xen? Good question. Didn't try. At least putchar should be implemented for that. Which version of Xen are you using? This morning's staging, commit-id 58eb90a9650a8ea73533bc2b87c13b8ca7bbe35a. This also tells you that in the trap case the vGIC is not the bigger overhead. Indeed, not the bigger. But significant even in this trivial case (receiving an interrupt twice a second). This is with all your series applied but [4], correct? Right. Did you try to see the perfomance improvement patch by patch? No. Not yet. -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
(sorry for the formatting) On Mon, 10 Dec 2018, 12:00 Andrii Anisov, wrote: > Hello All, > > On 27.11.18 23:27, Stefano Stabellini wrote: > > See the following: > > > > https://marc.info/?l=xen-devel=148668817704668 > So I did port that stuff to the current staging [1]. > Also, the correspondent tbm, itself is here [2]. > Having 4 big cores on my SoC I run XEN with the following command line: > > dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 > bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none > guest_loglvl=all/none > > The TBM's domain configuration file is as following: > > seclabel='system_u:system_r:domU_t' > name = "DomP" > kernel = "/home/root/ctest-bare.bin" > extra = "console=hvc0 rw" > memory = 128 > vcpus = 1 > cpus = "3" > > This gives me setup where Domain-0 runs on cores 0 and 1 solely and TBM > runs exclusively on core 3. So that we can rely that it shows us a pure IRQ > latency of hypervisor. > My board is Renesas Salvator-X with H3 ES3.0 SoC and 8GB RAM. Generic > timer runs at 8.333 MHz freq, what gives my 120ns resolution for > measurements. > XEN hypervisor is build without debug and TBM does wfi in the idle loop > for all experiments. > With that setup IRQ latency numbers are (in ns): > What are the numbers without Xen? Which version of Xen are you using? > Old vgic: > AVG MIN MAX WARM MAX > credit, vwfi=trap 7706756094808400 > credit, vwfi=native 2908288031204800 > credit2, vwfi=trap 7221720092407440 > credit2, vwfi=native2906288031205040 > > New vgic: > AVG MIN MAX WARM MAX > credit, vwfi=trap 8481804010200 8880 > credit, vwfi=native 4115396048004200 > credit2, vwfi=trap 8425840096009000 > credit2, vwfi=native4227396050404680 > > Here we can see that the new vgic underperforms the old one in a trivial > use-case modeled with TBM. > The vwfi=trap does not look so bad (10%) but indeed the vwfi=native adds a bigger overhead. This also tells you that in the trap case the vGIC is not the bigger overhead. I am pretty sure that this can be optimized because we mostly focused on reliability and specification compliance for the first draft. So yes the old vGIC performs better but at the price of unreliability and non-compliance. > Old vgic with optimizations [3] (without [4], because it breaks the setup): > AVG MIN MAX WARM MAX > credit, vwfi=trap 7309708087607680 > credit, vwfi=native 3007300043203120 > credit2, vwfi=trap 6877672088807200 > credit2, vwfi=native2680264044402880 > This is with all your series applied but [4], correct? Did you try to see the perfomance improvement patch by patch? Cheers > > > [1] https://github.com/aanisov/xen/tree/4tbm > [2] https://github.com/aanisov/tbm/commits/4xen > [3] > https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html > [4] > https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03288.html > > -- > Sincerely, > Andrii Anisov. > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello All, On 27.11.18 23:27, Stefano Stabellini wrote: See the following: https://marc.info/?l=xen-devel=148668817704668 So I did port that stuff to the current staging [1]. Also, the correspondent tbm, itself is here [2]. Having 4 big cores on my SoC I run XEN with the following command line: dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none guest_loglvl=all/none The TBM's domain configuration file is as following: seclabel='system_u:system_r:domU_t' name = "DomP" kernel = "/home/root/ctest-bare.bin" extra = "console=hvc0 rw" memory = 128 vcpus = 1 cpus = "3" This gives me setup where Domain-0 runs on cores 0 and 1 solely and TBM runs exclusively on core 3. So that we can rely that it shows us a pure IRQ latency of hypervisor. My board is Renesas Salvator-X with H3 ES3.0 SoC and 8GB RAM. Generic timer runs at 8.333 MHz freq, what gives my 120ns resolution for measurements. XEN hypervisor is build without debug and TBM does wfi in the idle loop for all experiments. With that setup IRQ latency numbers are (in ns): Old vgic: AVG MIN MAX WARM MAX credit, vwfi=trap 7706756094808400 credit, vwfi=native 2908288031204800 credit2, vwfi=trap 7221720092407440 credit2, vwfi=native2906288031205040 New vgic: AVG MIN MAX WARM MAX credit, vwfi=trap 8481804010200 8880 credit, vwfi=native 4115396048004200 credit2, vwfi=trap 8425840096009000 credit2, vwfi=native4227396050404680 Here we can see that the new vgic underperforms the old one in a trivial use-case modeled with TBM. Old vgic with optimizations [3] (without [4], because it breaks the setup): AVG MIN MAX WARM MAX credit, vwfi=trap 7309708087607680 credit, vwfi=native 3007300043203120 credit2, vwfi=trap 6877672088807200 credit2, vwfi=native2680264044402880 [1] https://github.com/aanisov/xen/tree/4tbm [2] https://github.com/aanisov/tbm/commits/4xen [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html [4] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03288.html -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Stefano, On 27.11.18 23:27, Stefano Stabellini wrote: Hi Andrii, See the following: https://marc.info/?l=xen-devel=148668817704668 Thank you for the point. I remember this email, but missed it also gives details to setup the experiment. It looks that bare-metal app is not SoC specific, so going to take it in use. The numbers have improved now thanks to vwfi=native and other optimizations but the mechanism to setup the experiment are the same.I know about `vwfi=native` but it does not fit our requirements :( -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Tue, 20 Nov 2018, Andrii Anisov wrote: > Hello Stefano, > > On 01.11.18 22:20, Stefano Stabellini wrote: > > No, I haven't had any time. Aside from the Xen version, another > > difference is the interrupt source. I used the physical timer for > > testing. > > Could you share your approach for interrupts latency measurement? Are you > using any HW specifics or it is SoC independent? > > I would like to get more evidences for optimizations of gic/vgic/gic-v2 code I > did for our customer (its about old vgic, we are still on xen 4.10). Hi Andrii, See the following: https://marc.info/?l=xen-devel=148668817704668 The numbers have improved now thanks to vwfi=native and other optimizations but the mechanism to setup the experiment are the same. Cheers, Stefano ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hello Stefano, On 01.11.18 22:20, Stefano Stabellini wrote: No, I haven't had any time. Aside from the Xen version, another difference is the interrupt source. I used the physical timer for testing. Could you share your approach for interrupts latency measurement? Are you using any HW specifics or it is SoC independent? I would like to get more evidences for optimizations of gic/vgic/gic-v2 code I did for our customer (its about old vgic, we are still on xen 4.10). -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Dario, On 09/10/2018 17:46, Dario Faggioli wrote: On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote: Hi, Hi Milan, I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with carrier card. I created bare-metal application in Xilinx SDK. In bm application I: - start triple timer counter (ttc) which generates interrupt every 1us - turn on PS LED - call function 100 times in for loop (function that sets some values) - turn off LED - stop triple timer counter - reset counter value Ok, I'm adding Stefano, Julien, and a couple of other people interested in RT/lowlat on Xen. I ran this bare-metal application under Xen Hypervisor with following settings: - used null scheduler (sched=null) and vwfi=native - bare-metal application have one vCPU and it is pinned for pCPU1 - domain which is PetaLinux also have one vCPU pinned for pCPU0, other pCPUs are unused. Under Xen Hypervisor I can see 3us jitter on oscilloscope. So, this is probably me not being familiar with Xen on Xilinx (and with Xen on ARM as a whole), but there's a few things I'm not sure I understand: - you say you use sched=null _and_ pinning? That should not be necessary (although, it shouldn't hurt either) - "domain which is PetaLinux", is that dom0? IAC, if it's not terrible hard to run this kind of test, I'd say, try without 'vwfi=native', and also with another scheduler, like Credit, (but then do make sure you use pinning). When I ran same bm application with JTAG from Xilinx SDK (without Xen Hypervisor, directly on the board) there is no jitter. Here, when you say "without Xen", do you also mean without any baremetal OS at all? I'm curios what causes this 3us jitter in Xen (which isn't small jitter at all) and is there any way of decreasing it? Right. So, I'm not sure I've understood the test scenario either. But yeah, 3us jitter seems significant. Still, if we're comparing with bare-hw, without even an OS at all, I think it could have been expected for latency and jitter to be higher in the Xen case. Anyway, I am not sure anyone has done a kind of analysis that could help us identify accurately from where things like that come, and in what proportions. It would be really awesome to have something like that, so do go ahead if you feel like it. :-) I think tracing could help a little (although we don't have a super- sophisticated tracing infrastructure like Linux's perf and such), but sadly enough, that's still not available on ARM, I think. :-/ FWIW, I just posted a series to add xentrace support on Arm. Hopefully we can get this merged for Xen 4.12. Cheers, [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg00563.html Regards, Dario -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Stefano, On 11/1/18 8:20 PM, Stefano Stabellini wrote: On Wed, 31 Oct 2018, Julien Grall wrote: On 10/31/18 8:35 PM, Milan Boberic wrote: Hi, Interesting. Could you confirm the commit you were using (or the point release)? Stefano's number were based on commit "fuzz: update README.afl example" 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version of Xen. All Xens I used are from Xilinx git repository because I have UltraZed-EG board which has Zynq UltraScale SoC. Under branches you can find Xen 4.8, 4.9, etc. I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30 Here is link to it: https://github.com/Xilinx/xen/tree/xilinx/stable-4.9 This branch is quite ahead of the branch Stefano's used. There are 94 commits more just for Arm specific code. What I am interested is to see if we are able to reproduce Stefano's number with the same branch. So we can have a clue whether there are a slow down introduce in new code. Stefano, you mention you will look at reproducing the numbers. Do you have any update on this? No, I haven't had any time. Aside from the Xen version, another difference is the interrupt source. I used the physical timer for testing. I would be actually surprised that the interrupt latency varies with virtualization depending on the interrupts... If that were the case, then doing the latency on the physical interrupt (unlikely going to be used by virtualized guest) was quite pointless. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, 31 Oct 2018, Julien Grall wrote: > On 10/31/18 8:35 PM, Milan Boberic wrote: > > Hi, > > > > > Interesting. Could you confirm the commit you were using (or the point > > > release)? > > > Stefano's number were based on commit "fuzz: update README.afl example" > > > 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version > > > of Xen. > > > > All Xens I used are from Xilinx git repository because I have > > UltraZed-EG board which has Zynq UltraScale SoC. > > Under branches you can find Xen 4.8, 4.9, etc. > > I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30 > > Here is link to it: > > https://github.com/Xilinx/xen/tree/xilinx/stable-4.9 > > This branch is quite ahead of the branch Stefano's used. There are 94 commits > more just for Arm specific code. > > What I am interested is to see if we are able to reproduce Stefano's number > with the same branch. So we can have a clue whether there are a slow down > introduce in new code. > > Stefano, you mention you will look at reproducing the numbers. Do you have any > update on this? No, I haven't had any time. Aside from the Xen version, another difference is the interrupt source. I used the physical timer for testing. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 10/31/18 8:35 PM, Milan Boberic wrote: Hi, Interesting. Could you confirm the commit you were using (or the point release)? Stefano's number were based on commit "fuzz: update README.afl example" 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version of Xen. All Xens I used are from Xilinx git repository because I have UltraZed-EG board which has Zynq UltraScale SoC. Under branches you can find Xen 4.8, 4.9, etc. I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30 Here is link to it: https://github.com/Xilinx/xen/tree/xilinx/stable-4.9 This branch is quite ahead of the branch Stefano's used. There are 94 commits more just for Arm specific code. What I am interested is to see if we are able to reproduce Stefano's number with the same branch. So we can have a clue whether there are a slow down introduce in new code. Stefano, you mention you will look at reproducing the numbers. Do you have any update on this? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi, > Interesting. Could you confirm the commit you were using (or the point > release)? > Stefano's number were based on commit "fuzz: update README.afl example" > 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version > of Xen. All Xens I used are from Xilinx git repository because I have UltraZed-EG board which has Zynq UltraScale SoC. Under branches you can find Xen 4.8, 4.9, etc. I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30 Here is link to it: https://github.com/Xilinx/xen/tree/xilinx/stable-4.9 Best regards. Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Milan, On 10/29/18 12:29 PM, Milan Boberic wrote: Sorry for late reply, Don't worry, thank you for the testing and sending the .config. I am afraid no. .config is generated during building time. So can you paste here please. ".config" file is in attachment. I also tried Xen 4.9 and I got almost same numbers, jitter is smaller by 150ns which isn't significant change at all. Interesting. Could you confirm the commit you were using (or the point release)? Stefano's number were based on commit "fuzz: update README.afl example" 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version of Xen. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Sorry for late reply, > I am afraid no. .config is generated during building time. So can you > paste here please. ".config" file is in attachment. I also tried Xen 4.9 and I got almost same numbers, jitter is smaller by 150ns which isn't significant change at all. Milan # # Automatically generated file; DO NOT EDIT. # Xen/arm 4.11.1-pre Configuration # CONFIG_64BIT=y CONFIG_ARM_64=y CONFIG_ARM=y CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig" # # Architecture Features # CONFIG_NR_CPUS=128 # CONFIG_ACPI is not set CONFIG_GICV3=y # CONFIG_HAS_ITS is not set # CONFIG_NEW_VGIC is not set CONFIG_SBSA_VUART_CONSOLE=y # # ARM errata workaround via the alternative framework # CONFIG_ARM64_ERRATUM_827319=y CONFIG_ARM64_ERRATUM_824069=y CONFIG_ARM64_ERRATUM_819472=y CONFIG_ARM64_ERRATUM_832075=y CONFIG_ARM64_ERRATUM_834220=y CONFIG_HARDEN_BRANCH_PREDICTOR=y CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y CONFIG_ALL_PLAT=y # CONFIG_QEMU is not set # CONFIG_RCAR3 is not set # CONFIG_MPSOC is not set CONFIG_ALL64_PLAT=y # CONFIG_ALL32_PLAT is not set CONFIG_MPSOC_PLATFORM=y # # Common Features # CONFIG_HAS_ALTERNATIVE=y CONFIG_HAS_DEVICE_TREE=y # CONFIG_MEM_ACCESS is not set CONFIG_HAS_PDX=y CONFIG_TMEM=y # CONFIG_XSM is not set # # Schedulers # CONFIG_SCHED_CREDIT=y CONFIG_SCHED_CREDIT2=y CONFIG_SCHED_RTDS=y # CONFIG_SCHED_ARINC653 is not set CONFIG_SCHED_NULL=y CONFIG_SCHED_CREDIT_DEFAULT=y # CONFIG_SCHED_CREDIT2_DEFAULT is not set # CONFIG_SCHED_RTDS_DEFAULT is not set # CONFIG_SCHED_NULL_DEFAULT is not set CONFIG_SCHED_DEFAULT="credit" # CONFIG_LIVEPATCH is not set CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS=y CONFIG_CMDLINE="" # # Device Drivers # CONFIG_HAS_NS16550=y CONFIG_HAS_CADENCE_UART=y CONFIG_HAS_MVEBU=y CONFIG_HAS_PL011=y CONFIG_HAS_SCIF=y CONFIG_HAS_PASSTHROUGH=y CONFIG_ARM_SMMU=y CONFIG_VIDEO=y CONFIG_HAS_ARM_HDLCD=y CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG" # # Debugging Options # # CONFIG_DEBUG is not set # CONFIG_FRAME_POINTER is not set # CONFIG_COVERAGE is not set # CONFIG_LOCK_PROFILE is not set # CONFIG_PERF_COUNTERS is not set # CONFIG_VERBOSE_DEBUG is not set # CONFIG_DEVICE_TREE_DEBUG is not set # CONFIG_SCRUB_DEBUG is not set___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Fri, 26 Oct 2018, Julien Grall wrote: > Hi Stefano, > > On 10/25/18 5:15 PM, Stefano Stabellini wrote: > > On Thu, 25 Oct 2018, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 10/24/18 1:24 AM, Stefano Stabellini wrote: > > > > On Tue, 23 Oct 2018, Milan Boberic wrote: > > > > I don't have any other things to suggest right now. You should be able > > > > to measure an overall 2.5us IRQ latency (if the interrupt rate is not > > > > too high). > > > > > > Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming > > > from > > > the blog post [1] which is based on Xen 4.9? > > > > > > If the latter, then I can't rule out we may have introduce a slowdown for > > > good > > > or bad reason... > > > > > > To rule out this possibility, I would recommend to try and reproduce the > > > same > > > number on Xen 4.9 and then try with Xen 4.11. > > > > > > Cheers, > > > > > > [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/ > > > > I was talking about the old numbers from Xen 4.9. You are right, we > > cannot rule out the possibility that we introduced a slowdown. > > Can you try to reproduce those number with your setup on Xen 4.11? Yes, I intend to, it is on my TODO. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Stefano, On 10/25/18 5:15 PM, Stefano Stabellini wrote: On Thu, 25 Oct 2018, Julien Grall wrote: Hi Stefano, On 10/24/18 1:24 AM, Stefano Stabellini wrote: On Tue, 23 Oct 2018, Milan Boberic wrote: I don't have any other things to suggest right now. You should be able to measure an overall 2.5us IRQ latency (if the interrupt rate is not too high). Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from the blog post [1] which is based on Xen 4.9? If the latter, then I can't rule out we may have introduce a slowdown for good or bad reason... To rule out this possibility, I would recommend to try and reproduce the same number on Xen 4.9 and then try with Xen 4.11. Cheers, [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/ I was talking about the old numbers from Xen 4.9. You are right, we cannot rule out the possibility that we introduced a slowdown. Can you try to reproduce those number with your setup on Xen 4.11? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Thu, 25 Oct 2018, Julien Grall wrote: > Hi Stefano, > > On 10/24/18 1:24 AM, Stefano Stabellini wrote: > > On Tue, 23 Oct 2018, Milan Boberic wrote: > > I don't have any other things to suggest right now. You should be able > > to measure an overall 2.5us IRQ latency (if the interrupt rate is not > > too high). > > Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from > the blog post [1] which is based on Xen 4.9? > > If the latter, then I can't rule out we may have introduce a slowdown for good > or bad reason... > > To rule out this possibility, I would recommend to try and reproduce the same > number on Xen 4.9 and then try with Xen 4.11. > > Cheers, > > [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/ I was talking about the old numbers from Xen 4.9. You are right, we cannot rule out the possibility that we introduced a slowdown. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 10/25/18 3:47 PM, Milan Boberic wrote: I was asking the Xen configuration (xen/.config) to know what you have enabled in Xen. Oh, sorry, because I'm building xen from git repository here is the link to it where you can check the file you mentioned. https://github.com/Xilinx/xen/tree/xilinx/versal/xen I am afraid no. .config is generated during building time. So can you paste here please. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
> I was asking the Xen configuration (xen/.config) to know what you have > enabled in Xen. Oh, sorry, because I'm building xen from git repository here is the link to it where you can check the file you mentioned. https://github.com/Xilinx/xen/tree/xilinx/versal/xen > It might, OTOH, be wise to turn it on when investigating the system > behavior (but that's a general remark, I don't know to what Julien was > referring to in this specific case). I will definitely try to enable DEBUG. Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 10/25/18 1:36 PM, Milan Boberic wrote: On Thu, Oct 25, 2018 at 1:30 PM Julien Grall wrote: Hi Milan, Hi Julien, Sorry if it was already asked. Can you provide your .config for your test? Yes of course, bare-metal's .cfg file is in it's in attachment (if that is what you asked :) ). I was asking the Xen configuration (xen/.config) to know what you have enabled in Xen. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Dario, On 10/25/18 2:44 PM, Dario Faggioli wrote: On Thu, 2018-10-25 at 14:36 +0200, Milan Boberic wrote: On Thu, Oct 25, 2018 at 1:30 PM Julien Grall wrote: Do you have DEBUG enabled? I'm not sure where exactly should I disable it. If you check line 18 in xl dmesg file in attachment it says debug=n, it's output of xl dmesg. I'm not sure if that is the DEBUG you are talking about. Yes, this mean debug is *not* enabled. Which is the correct setup for doing performance/latency evaluation. It might, OTOH, be wise to turn it on when investigating the system behavior (but that's a general remark, I don't know to what Julien was referring to in this specific case). The narrow down the discrepancies during the measurement, I wanted to check whether Milan were doing the performance measurement with debug enabled. Now I can tick off DEBUG been a potential cause of the latency/performance. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Thu, 2018-10-25 at 14:36 +0200, Milan Boberic wrote: > On Thu, Oct 25, 2018 at 1:30 PM Julien Grall > wrote: > > Do you have DEBUG enabled? > > I'm not sure where exactly should I disable it. If you check line 18 > in xl dmesg file in attachment it says debug=n, it's output of xl > dmesg. I'm not sure if that is the DEBUG you are talking about. > Yes, this mean debug is *not* enabled. Which is the correct setup for doing performance/latency evaluation. It might, OTOH, be wise to turn it on when investigating the system behavior (but that's a general remark, I don't know to what Julien was referring to in this specific case). To turn it on, in a recent enough Xen, which I think is what you're using, you can use Kconfig (e.g., `make -C xen/ menuconfig'). > Also if I add prints somewhere in the code, I can see them, does that > mean that DEBUG is enabled? If yes, can you tell me where exactly > should I disable it? > It depends on the "print". If you add 'printk("bla");', it is correct that you see "bla" in the log, even with debug=n. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Thu, Oct 25, 2018 at 1:30 PM Julien Grall wrote: > > Hi Milan, Hi Julien, > Sorry if it was already asked. Can you provide your .config for your > test? Yes of course, bare-metal's .cfg file is in it's in attachment (if that is what you asked :) ). > Do you have DEBUG enabled? I'm not sure where exactly should I disable it. If you check line 18 in xl dmesg file in attachment it says debug=n, it's output of xl dmesg. I'm not sure if that is the DEBUG you are talking about. Also if I add prints somewhere in the code, I can see them, does that mean that DEBUG is enabled? If yes, can you tell me where exactly should I disable it? Thanks in advance! Milan name = "test" kernel = "only_timer.bin" memory = 8 vcpus = 1 cpus = [1] irqs = [ 48, 54, 68, 69, 70 ] iomem = [ "0xff010,1", "0xff110,1", "0xff120,1", "0xff130,1", "0xff140,1", "0xff0a0,1" ](XEN) Checking for initrd in /chosen (XEN) Initrd 02bd7000-05fffd6d (XEN) RAM: - 7fef (XEN) (XEN) MODULE[0]: 07ff4000 - 07ffc080 Device Tree (XEN) MODULE[1]: 02bd7000 - 05fffd6d Ramdisk (XEN) MODULE[2]: 0008 - 0318 Kernel (XEN) RESVD[0]: 07ff4000 - 07ffc000 (XEN) RESVD[1]: 02bd7000 - 05fffd6d (XEN) (XEN) Command line: console=dtuart dtuart=serial0 dom0_mem=1024M bootscrub=0 dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null vwfi=native serrors=panic (XEN) Placing Xen at 0x7fc0-0x7fe0 (XEN) Update BOOTMOD_XEN from 0600-06108d81 => 7fc0-7fd08d81 (XEN) Domain heap initialised (XEN) Booting using Device Tree (XEN) Looking for dtuart at "serial0", options "" Xen 4.11.1-pre (XEN) Xen version 4.11.1-pre (milan@) (aarch64-xilinx-linux-gcc (GCC) 7.2.0) debug=n Wed Oct 24 10:11:47 CEST 2018 (XEN) Latest ChangeSet: Mon Sep 24 16:07:33 2018 -0700 git:8610a91abc-dirty (XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4 (XEN) 64-bit Execution: (XEN) Processor Features: (XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32 (XEN) Extensions: FloatingPoint AdvancedSIMD (XEN) Debug Features: 10305106 (XEN) Auxiliary Features: (XEN) Memory Model Features: 1122 (XEN) ISA Features: 00011120 (XEN) 32-bit Execution: (XEN) Processor Features: 0131:00011011 (XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 03010066 (XEN) Auxiliary Features: (XEN) Memory Model Features: 10201105 4000 0126 02102211 (XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 9 KHz (XEN) GICv2 initialization: (XEN) gic_dist_addr=f901 (XEN) gic_cpu_addr=f902 (XEN) gic_hyp_addr=f904 (XEN) gic_vcpu_addr=f906 (XEN) gic_maintenance_irq=25 (XEN) GICv2: Adjusting CPU interface base to 0xf902f000 (XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b). (XEN) Using scheduler: null Scheduler (null) (XEN) Initializing null scheduler (XEN) WARNING: This is experimental software in development. (XEN) Use at your own risk. (XEN) Allocated console ring of 16 KiB. (XEN) Bringing up CPU1 (XEN) Bringing up CPU2 (XEN) Bringing up CPU3 (XEN) Brought up 4 CPUs (XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID (XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558 (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) *** LOADING DOMAIN 0 *** (XEN) Loading kernel from boot module @ 0008 (XEN) Loading ramdisk from boot module @ 02bd7000 (XEN) Allocating 1:1 mappings totalling 1024MB for dom0: (XEN) BANK[0] 0x002000-0x006000 (1024MB) (XEN) Grant table range: 0x007fc0-0x007fc4 (XEN) Allocating PPI 16 for event channel interrupt (XEN) Loading zImage from 0008 to 2008-2318 (XEN) Loading dom0 initrd from 02bd7000 to 0x2820-0x2b628d6d (XEN) Loading dom0 DTB to 0x2800-0x28006e46 (XEN) Initial low memory virq threshold set at 0x4000 pages. (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 280kB init memory. (XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER4 (XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER8 (XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER12 (XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER16 (XEN) d0v0: vGICD: unhandled word write 0x to ICACTIVER20 (XEN)
Re: [Xen-devel] Xen optimization
Hi Milan, On 10/25/18 11:09 AM, Milan Boberic wrote: Hi, On Wed, Oct 24, 2018 at 2:24 AM Stefano Stabellini wrote: It is good that there are no physical interrupts interrupting the cpu. serrors=panic makes the context switch faster. I guess there are not enough context switches to make a measurable difference. Yes, when I did: grep ctxt /proc/2153/status I got: voluntary_ctxt_switches:5 nonvoluntary_ctxt_switches: 3 I don't have any other things to suggest right now. You should be able to measure an overall 2.5us IRQ latency (if the interrupt rate is not too high). This bare-metal application is the most suspicious, indeed. Still waiting answer on Xilinx forum. Sorry if it was already asked. Can you provide your .config for your test? Do you have DEBUG enabled? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Stefano, On 10/24/18 1:24 AM, Stefano Stabellini wrote: On Tue, 23 Oct 2018, Milan Boberic wrote: I don't have any other things to suggest right now. You should be able to measure an overall 2.5us IRQ latency (if the interrupt rate is not too high). Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from the blog post [1] which is based on Xen 4.9? If the latter, then I can't rule out we may have introduce a slowdown for good or bad reason... To rule out this possibility, I would recommend to try and reproduce the same number on Xen 4.9 and then try with Xen 4.11. Cheers, [1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/ -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi, > On Wed, Oct 24, 2018 at 2:24 AM Stefano Stabellini > wrote: > It is good that there are no physical interrupts interrupting the cpu. > serrors=panic makes the context switch faster. I guess there are not > enough context switches to make a measurable difference. Yes, when I did: grep ctxt /proc/2153/status I got: voluntary_ctxt_switches:5 nonvoluntary_ctxt_switches: 3 > I don't have any other things to suggest right now. You should be able > to measure an overall 2.5us IRQ latency (if the interrupt rate is not > too high). This bare-metal application is the most suspicious, indeed. Still waiting answer on Xilinx forum. > Just to be paranoid, we might also want to check the following, again it > shouldn't get printed: > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > index 5a4f082..6cf6814 100644 > --- a/xen/arch/arm/vgic.c > +++ b/xen/arch/arm/vgic.c > @@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, > unsigned int virq, > struct pending_irq *iter, *n; > unsigned long flags; > +if ( d->domain_id != 0 && virq != 68 ) > +printk("DEBUG virq=%d local=%d\n",virq,v == current); > /* > * For edge triggered interrupts we always ignore a "falling edge". > * For level triggered interrupts we shouldn't, but do anyways. Checked it again, no prints. I hoped that I will discover some vIRQs or pIRQs slowing things down but no, no prints. I might try something else instead of this bare-metal application because this Xilinx SDK example is very suspicious. Thank you for your time. Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Tue, 23 Oct 2018, Milan Boberic wrote: > > Just add an && irq != 1023 to the if check. > Added it and now when I create bare-metal guest it prints only once: > > (XEN) DEBUG irq=0 > (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it > root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in > the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it > > > This part always prints only once when I create this bare-metal guest > like I mentioned in earlier replies and we said it doesn't do any > harm: > > (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it > root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in > the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it > (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it > > Now, from this patch I get: > > (XEN) DEBUG irq=0 > > also printed only once. > > Forgot to mention in reply before this one, I added serrors=panic and > it didn't make any change, numbers are the same. > > Thanks in advance! It is good that there are no physical interrupts interrupting the cpu. serrors=panic makes the context switch faster. I guess there are not enough context switches to make a measurable difference. I don't have any other things to suggest right now. You should be able to measure an overall 2.5us IRQ latency (if the interrupt rate is not too high). Just to be paranoid, we might also want to check the following, again it shouldn't get printed: diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 5a4f082..6cf6814 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq, struct pending_irq *iter, *n; unsigned long flags; +if ( d->domain_id != 0 && virq != 68 ) +printk("DEBUG virq=%d local=%d\n",virq,v == current); /* * For edge triggered interrupts we always ignore a "falling edge". * For level triggered interrupts we shouldn't, but do anyways. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
> Just add an && irq != 1023 to the if check. Added it and now when I create bare-metal guest it prints only once: (XEN) DEBUG irq=0 (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it This part always prints only once when I create this bare-metal guest like I mentioned in earlier replies and we said it doesn't do any harm: (XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it root@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it (XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it Now, from this patch I get: (XEN) DEBUG irq=0 also printed only once. Forgot to mention in reply before this one, I added serrors=panic and it didn't make any change, numbers are the same. Thanks in advance! Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Mon, 22 Oct 2018, Milan Boberic wrote: > Hi, > > > I think we want to fully understand how many other interrupts the > > baremetal guest is receiving. To do that, we can modify my previous > > patch to suppress any debug messages for virq=68. That way, we should > > only see the other interrupts. Ideally there would be none. > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > > index 5a4f082..b7a8e17 100644 > > --- a/xen/arch/arm/vgic.c > > +++ b/xen/arch/arm/vgic.c > > @@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, > > unsigned int virq, > > /* the irq is enabled */ > > if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) > > +{ > > gic_raise_guest_irq(v, virq, priority); > > +if ( d->domain_id != 0 && virq != 68 ) > > +printk("DEBUG virq=%d local=%d\n",virq,v == current); > > +} > > list_for_each_entry ( iter, >arch.vgic.inflight_irqs, inflight ) > > { > > when I apply this patch there are no prints nor debug messages in xl > dmesg. So bare-metal receives only interrupt 68, which is good. Yes, good! > > Next step would be to verify that there are no other physical interrupts > > interrupting the vcpu execution other the irq=68. We should be able to > > check that with the following debug patch: > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index e524ad5..b34c3e4 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int > > is_fiq) > > /* Reading IRQ will ACK it */ > > irq = gic_hw_ops->read_irq(); > > +if (current->domain->domain_id > 0 && irq != 68) > > +{ > > +local_irq_enable(); > > +printk("DEBUG irq=%d\n",irq); > > +local_irq_disable(); > > +} > > + > > if ( likely(irq >= 16 && irq < 1020) ) > > { > > local_irq_enable(); > > But when I apply this patch it prints forever: > (XEN) DEBUG irq=1023 > > Thanks in advance! I know why! It's because we always loop around until we read the spurious interrupt. Just add an && irq != 1023 to the if check. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi, > I think we want to fully understand how many other interrupts the > baremetal guest is receiving. To do that, we can modify my previous > patch to suppress any debug messages for virq=68. That way, we should > only see the other interrupts. Ideally there would be none. > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > index 5a4f082..b7a8e17 100644 > --- a/xen/arch/arm/vgic.c > +++ b/xen/arch/arm/vgic.c > @@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, > unsigned int virq, > /* the irq is enabled */ > if ( test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) > +{ > gic_raise_guest_irq(v, virq, priority); > +if ( d->domain_id != 0 && virq != 68 ) > +printk("DEBUG virq=%d local=%d\n",virq,v == current); > +} > list_for_each_entry ( iter, >arch.vgic.inflight_irqs, inflight ) > { when I apply this patch there are no prints nor debug messages in xl dmesg. So bare-metal receives only interrupt 68, which is good. > Next step would be to verify that there are no other physical interrupts > interrupting the vcpu execution other the irq=68. We should be able to > check that with the following debug patch: > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index e524ad5..b34c3e4 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int > is_fiq) > /* Reading IRQ will ACK it */ > irq = gic_hw_ops->read_irq(); > +if (current->domain->domain_id > 0 && irq != 68) > +{ > +local_irq_enable(); > +printk("DEBUG irq=%d\n",irq); > +local_irq_disable(); > +} > + > if ( likely(irq >= 16 && irq < 1020) ) > { > local_irq_enable(); But when I apply this patch it prints forever: (XEN) DEBUG irq=1023 Thanks in advance! Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Fri, 2018-10-19 at 14:02 -0700, Stefano Stabellini wrote: > On Wed, 17 Oct 2018, Milan Boberic wrote: > > I checked interrupt frequency with oscilloscope > > just to be sure (toggling LED on/off when interrupts occur). So, > > when I set: > > - interrupts to be generated every 8 us I get jitter of 6 us > > - interrupts to be generated every 10 us I get jitter of 3 us > > (after > > 2-3mins it jumps to 6 us) > > - interrupts to be generated every 15 us jitter is the same as when > > only bare-metal application runs on board (without Xen or any OS) > > These are very interesting numbers! > Indeed. > Thanks again for running these > experiments. I don't want to jump to conclusions but they seem to > verify > the theory that if the interrupt frequency is too high, we end up > spending too much time handling interrupts, the system cannot cope, > hence jitter increases. > Yep, this makes a lot of sense. > However, I would have thought that the threshold should be lower than > 15us, given that it takes 2.5us to inject an interrupt. I have a > couple > of experiments suggestions below. > FWIW, I know that numbers are always relative (hw platform, workload, etc), and I'm happy to see that you're quite confident that we can improve further... but these numbers seems rather good to me. :-) Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi, > > The device tree with everything seems to be system.dts, that was enough > :-) I don't need the dtsi files you used to build the final dts, I only > need the one you use in uboot and for your guest. I wasn't sure so I sent everything, sorry for being bombarded with all those files. :-) > It looks like you set xen,passthrough correctly in system.dts for > timer@ff11, serial@ff01, and gpio@ff0a. Thank you for taking a look, now we are sure that passthrough works correctly because there is no error during guest creation and there are no prints of "DEBUG irq slow path". > If you are not getting any errors anymore when creating your baremetal > guest, then yes, it should be working passthrough. I would double-check > that everything is working as expected using the DEBUG patch for Xen I > suggested to you in the other email. You might even want to remove the > "if" check and always print something for every interrupt of your guest > just to get an idea of what's going on. See the attached patch. When I apply this patch it prints forever: (XEN) DEBUG virq=68 local=1 which is a good thing I guess because interrupts are being generated non-stop. > Once everything is as expected I would change the frequency of the > timer, because 1u is way too frequent. I think it should be at least > 3us, more like 5us. Okay, about this... I double checked my bare-metal application and looks like interrupts weren't generated every 1 us. Maximum frequency of interrupts is 8 us. I checked interrupt frequency with oscilloscope just to be sure (toggling LED on/off when interrupts occur). So, when I set: - interrupts to be generated every 8 us I get jitter of 6 us - interrupts to be generated every 10 us I get jitter of 3 us (after 2-3mins it jumps to 6 us) - interrupts to be generated every 15 us jitter is the same as when only bare-metal application runs on board (without Xen or any OS) I want to remind you that bare-metal application that only blinks LED with high speed gives 1 us jitter, somehow introducing frequent interrupts causes this jitter, that's why I was unsecure about this timer passthrough. Taking in consideration that you measured Xen overhead of 1 us I have a feeling that I'm missing something, is there anything else I could do to get better results except sched=null, vwfi=native, hard vCPU pinning (1 vCPU on 1 pCPU) and passthrough (not sure if it affects the jitter) ? I'm forcing frequent interrupts because I'm testing to see if this board with Xen on it could be used for real-time simulations, real-time signal processing, etc. If I could get results like yours (1 us Xen overhead) of even better that would be great! BTW how did you measure Xen's overhead? > Keep in mind that jitter is about having > deterministic IRQ latency, not about having extremely frequent > interrupts. Yes, but I want to see exactly where will I lose deterministic IRQ latency which is extremely important in real-time signal processing. So, what causes this jitter, are those Xen limits, ARM limits, etc? It would be nice to know, I'll share all the results I get. > I would also double check that you are not using any other devices or > virtual interfaces in your baremetal app because that could negatively > affect the numbers. I checked the bare-metal app and I think there is no other devices that bm app is using. > Linux by default uses the virtual > timer interface ("arm,armv8-timer", I would double check that the > baremetal app is not doing the same -- you don't want to be using two > timers when doing your measurements. Hmm, I'm not sure how to check that, I could send bare-metal app if that helps, it's created in Xilinx SDK 2017.4. Also, should I move to Xilinx SDK 2018.2 because I'm using PetaLinux 2018.2 ? I'm also using hardware description file for SDK that is created in Vivado 2017.4. Is all this could be a "not matching version" problem (I don't think so because bm app works)? Meng mentioned in some of his earlier posts: > Even though the app. is the only one running on the CPU, the CPU may > be used to handle other interrupts and its context (such as TLB and > cache) might be flushed by other components. When these happen, the > interrupt handling latency can vary a lot. What do you think about this? I don't know how would I check this. I also tried using default scheduler (removed sched=null and vwfi=native) and jitter is 10 us when interrupt is generated every 10 us. Thanks in advance! Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On 10/13/2018 05:01 PM, Milan Boberic wrote: Hi, Hi, Don't interrupt _come_ from hardware and go/are routed to hypervisor/os/app? Yes they do, sorry, I reversed the order because I'm a newbie :) . Would you mind to explain what is the triple timer counter? On this link on page 342 is explanation. Which link? This is not the official Xen repository and look like patches have been applied on top. I am afraid, I am not going to be able help here. Could you do the same experiment with Xen 4.11? I think I have to get Xen from Xilinx because I use board that has Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it. The board should be fully supported upstreamed. If Xilinx has more patch on top, then you would need to seek support from them because I don't know what they changed in Xen. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Mon, 15 Oct 2018, Milan Boberic wrote: > In attachment are device-tree files I found in my project: > > device-tree.bbappend - under > /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/ > > xen-overlay.dtsi , system-user.dtsi and zunqmp-qemu-arm.dts - under > /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files > > zynqmp-qemu-multiarch-arm and zynqmp-qemu-pmu - under > /uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files/multi-arch > > pcw.dtsi , pl.dtsi , system-conf.dtsi , sistem-top.dts , > zynqmp-clk-ccf.dtsi and zynqmp.dtsi - > under/uz3eg_iocc_2018_2/components/plnx_workspace/device-tree/device-tree/ > > In system-conf.dtsi file first line says: > /* > * CAUTION: This file is automatically generated by PetaLinux SDK. > * DO NOT modify this file > */ > and there is no sigh of timer. > If you could take a look at this and other files in attachment it > would be great. The device tree with everything seems to be system.dts, that was enough :-) I don't need the dtsi files you used to build the final dts, I only need the one you use in uboot and for your guest. In system.dts, the timers are all there: timer@ff11 { compatible = "cdns,ttc"; status = "okay"; interrupt-parent = <0x4>; interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; reg = <0x0 0xff11 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3b>; clocks = <0x3 0x1f>; xen,passthrough; }; timer@ff12 { compatible = "cdns,ttc"; status = "disabled"; interrupt-parent = <0x4>; interrupts = <0x0 0x27 0x4 0x0 0x28 0x4 0x0 0x29 0x4>; reg = <0x0 0xff12 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3b>; clocks = <0x3 0x1f>; }; timer@ff13 { compatible = "cdns,ttc"; status = "disabled"; interrupt-parent = <0x4>; interrupts = <0x0 0x2a 0x4 0x0 0x2b 0x4 0x0 0x2c 0x4>; reg = <0x0 0xff13 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3c>; clocks = <0x3 0x1f>; }; timer@ff14 { compatible = "cdns,ttc"; status = "disabled"; interrupt-parent = <0x4>; interrupts = <0x0 0x2d 0x4 0x0 0x2e 0x4 0x0 0x2f 0x4>; reg = <0x0 0xff14 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3d>; clocks = <0x3 0x1f>; }; It looks like you set xen,passthrough correctly in system.dts for timer@ff11, serial@ff01, and gpio@ff0a. > I also tried to run bare-metal app with this changes and it worked, added: > > { > status = "okay"; > compatible = "cdns,ttc"; > interrupt-parent = <0x4>; > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > reg = <0x0 0xff11 0x0 0x1000>; > timer-width = <0x20>; > power-domains = <0x3b>; > xen,passthrough; > > }; > > in xen-overlay.dtsi file, because it's overlay it shouldn't duplicate > timer nod, right? As I wrote, system.dts looks correct. > After build I ran: > dtc -I dtb -O dts -o system.dts system.dtb > and checked for ttc0, it seems okay except interrupt-parent is <0x4> > not <0x2> like in your example: I don't know what you are referring to. In the system.dts you attached interrupt-parent is <0x4> which is correct: timer@ff11 { compatible = "cdns,ttc"; status = "okay"; interrupt-parent = <0x4>; > timer@ff11 { > compatible = "cdns,ttc"; > status = "okay"; > interrupt-parent = <0x4>; > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > reg = <0x0 0xff11 0x0 0x1000>; > timer-width = <0x20>; > power-domains = <0x3b>; > clocks = <0x3 0x1f>; > xen,passthrough; > }; > status was "disable" before. > system.dts is also added in attachment. status is "okay" in the system.dts you attached. That is important because status = "disable" it means the device cannot be used. > Is this the working passthrough?Because jitter is the same . > > When legit, working passthrough is set correctly, jitter should be > smaller, right? If you are not getting any errors anymore when creating your baremetal guest, then yes, it should be working
Re: [Xen-devel] Xen optimization
On 15/10/2018 14:01, Milan Boberic wrote: On 15/10/2018 09:14, Julien Grall wrote: Which link? I made hyperlink on "link" word, looks like somehow it got lost, here is the link: https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf HTML should be avoided on the mailing list. Most of us are using text-only clients. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
> On 15/10/2018 09:14, Julien Grall wrote: > Which link? I made hyperlink on "link" word, looks like somehow it got lost, here is the link: https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > The board should be fully supported upstreamed. If Xilinx has more patch > on top, then you would need to seek support from them because I don't > know what they changed in Xen. I think Stefano can help, thanks for sugesstion. Cheers, Milan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
(Resending with a different address) On 15/10/2018 09:14, Julien Grall wrote: On 10/13/2018 05:01 PM, Milan Boberic wrote: Hi, Hi, Don't interrupt _come_ from hardware and go/are routed to hypervisor/os/app? Yes they do, sorry, I reversed the order because I'm a newbie :) . Would you mind to explain what is the triple timer counter? On this link on page 342 is explanation. Which link? This is not the official Xen repository and look like patches have been applied on top. I am afraid, I am not going to be able help here. Could you do the same experiment with Xen 4.11? I think I have to get Xen from Xilinx because I use board that has Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it. The board should be fully supported upstreamed. If Xilinx has more patch on top, then you would need to seek support from them because I don't know what they changed in Xen. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Sat, 13 Oct 2018, Milan Boberic wrote: > > This is definitely wrong. Can you please also post the full host device > > tree with your modifications that you are using for Xen and Dom0? You > > should have something like: > > > > timer@ff11 { > > compatible = "cdns,ttc"; > > interrupt-parent = <0x2>; > > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > > reg = <0x0 0xff11 0x0 0x1000>; > > timer-width = <0x20>; > > power-domains = <0x3b>; > > xen,passthrough; > > }; > > For each of the nodes of the devices you are assigning to the DomU. > > I put > { >xen,passthrough = <0x1>; > }; > because when I was making bm app I was following this guide. Now I see > it's wrong. When I copied directly: > timer@ff11 { > compatible = "cdns,ttc"; > interrupt-parent = <0x2>; > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > reg = <0x0 0xff11 0x0 0x1000>; > timer-width = <0x20>; > power-domains = <0x3b>; > xen,passthrough; > }; > in to the xen-overlay.dtsi file it resulted an error during > device-tree build. I modified it a little bit so I can get successful > build, there are all device-tree files included in attachment. I'm not > sure how to set this passthrough properly, if you could take a look at > those files in attachment I'd be more then grateful. > > > It's here: > > https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462 > Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway > now I moved to Xen 4.11 like you suggested and applied your patch and > Dario's also. > > Okay, now when I want to xl create my domU (bare-metal app) I get error: > > Parsing config from timer.cfg > (XEN) IRQ 68 is already used by domain 0 > libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed > give domain access to irq 68: Device or resource busy > libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain > 1:Non-existant domain > libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain > 1:Unable to destroy guest > libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain > 1:Destruction of domain failed That means that the "xen,passthrough" addition to the host device tree went wrong. > I guess my modifications of: > timer@ff11 { > compatible = "cdns,ttc"; > interrupt-parent = <0x2>; > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > reg = <0x0 0xff11 0x0 0x1000>; > timer-width = <0x20>; > power-domains = <0x3b>; > xen,passthrough; > }; > are not correct. Right > I tried to change interrupts to: > interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>; > because if you check here on page 310 interrupts for TTC0 are 68:70. > But that didn't work either I still get same error. The interrupt numbers specified in the DTS are the real interrupt minus 32: 68-32 = 36 = 0x24. The DTS was correct. > I also tried to change xen,passthrough; line with: > xen,passthrough = <0x1>; > but also without success, still the same error. > > Are you sure about this line: > reg = <0x0 0xff11 0x0 0x1000>; ? > Or it should be like this? > reg = <0x0 0xff11 0x1000>; Yes, that could be a problem. The format depends on the #address-cells and #size-cells parameters. You didn't send me system-conf.dtsi, so I don't know for sure which one of the two is right. In any case, you should not duplicate the timer@ff11 node in device tree. You should only add "xen,passthrough;" to the existing timer@ff11 node, which is probably in system-conf.dtsi. So, avoid adding a new timer node to xen-overlay.dtsi, and instead modify system-conf.dtsi. > I also included xl dmesg and dmesg in attachments (after xl create of bm app). > > Thanks in advance! > > Milan > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi, > Don't interrupt _come_ from hardware and go/are routed to > hypervisor/os/app? Yes they do, sorry, I reversed the order because I'm a newbie :) . > Would you mind to explain what is the triple timer counter? On this link on page 342 is explanation. > This is not the official Xen repository and look like patches have been > applied on top. I am afraid, I am not going to be able help here. Could you > do the same experiment with Xen 4.11? I think I have to get Xen from Xilinx because I use board that has Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it. > This could also means that wfi is not used by the guest or you never go to > the idle vCPU. Right. > This is definitely wrong. Can you please also post the full host device > tree with your modifications that you are using for Xen and Dom0? You > should have something like: > > timer@ff11 { > compatible = "cdns,ttc"; > interrupt-parent = <0x2>; > interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; > reg = <0x0 0xff11 0x0 0x1000>; > timer-width = <0x20>; > power-domains = <0x3b>; > xen,passthrough; > }; > For each of the nodes of the devices you are assigning to the DomU. I put { xen,passthrough = <0x1>; }; because when I was making bm app I was following this guide. Now I see it's wrong. When I copied directly: timer@ff11 { compatible = "cdns,ttc"; interrupt-parent = <0x2>; interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; reg = <0x0 0xff11 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3b>; xen,passthrough; }; in to the xen-overlay.dtsi file it resulted an error during device-tree build. I modified it a little bit so I can get successful build, there are all device-tree files included in attachment. I'm not sure how to set this passthrough properly, if you could take a look at those files in attachment I'd be more then grateful. > It's here: > https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462 Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway now I moved to Xen 4.11 like you suggested and applied your patch and Dario's also. Okay, now when I want to xl create my domU (bare-metal app) I get error: Parsing config from timer.cfg (XEN) IRQ 68 is already used by domain 0 libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed give domain access to irq 68: Device or resource busy libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 1:Non-existant domain libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 1:Unable to destroy guest libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 1:Destruction of domain failed I guess my modifications of: timer@ff11 { compatible = "cdns,ttc"; interrupt-parent = <0x2>; interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; reg = <0x0 0xff11 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3b>; xen,passthrough; }; are not correct. I tried to change interrupts to: interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>; because if you check here on page 310 interrupts for TTC0 are 68:70. But that didn't work either I still get same error. I also tried to change xen,passthrough; line with: xen,passthrough = <0x1>; but also without success, still the same error. Are you sure about this line: reg = <0x0 0xff11 0x0 0x1000>; ? Or it should be like this? reg = <0x0 0xff11 0x1000>; I also included xl dmesg and dmesg in attachments (after xl create of bm app). Thanks in advance! Milan FILESEXTRAPATHS_prepend := "${THISDIR}/files:" SRC_URI += "file://system-user.dtsi" SRC_URI += "file://xen-overlay.dtsi"(XEN) Checking for initrd in /chosen (XEN) Initrd 02bd7000-05fffd97 (XEN) RAM: - 7fef (XEN) (XEN) MODULE[0]: 07ff4000 - 07ffc080 Device Tree (XEN) MODULE[1]: 02bd7000 - 05fffd97 Ramdisk (XEN) MODULE[2]: 0008 - 0318 Kernel (XEN) RESVD[0]: 07ff4000 - 07ffc000 (XEN) RESVD[1]: 02bd7000 - 05fffd97 (XEN) (XEN) Command line: console=dtuart dtuart=serial0 dom0_mem=768M bootscrub=0 dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null vwfi=native (XEN) Placing Xen at 0x7fc0-0x7fe0 (XEN) Update BOOTMOD_XEN from 0600-06108d81 => 7fc0-7fd08d81 (XEN) Domain heap initialised (XEN) Booting using Device Tree (XEN) Looking for dtuart at "serial0", options "" Xen 4.11.1-pre (XEN) Xen version 4.11.1-pre (milan@) (aarch64-xilinx-linux-gcc (GCC) 7.2.0) debug=n Sat Oct 13 16:34:51 CEST 2018 (XEN) Latest ChangeSet: Mon Sep 24 16:07:33 2018 -0700 git:8610a91abc-dirty (XEN)
Re: [Xen-devel] Xen optimization
On Fri, 12 Oct 2018, Milan Boberic wrote: > Hi Stefano, glad to have you back :D, > this is my setup: > - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0 > - there is only one domU and this is my bare-metal app that also have > one vCPU and it's pinned for pCPU1 > so yeah, there is only dom0 and bare-metal app on the board. > > Jitter is the same with and without Dario's patch. > > I'm still not sure about timer's passthrough because there is no mention of > triple timer counter is device tree so I added: > > { > xen,passthrough = <0x1>; > }; > > at the end of the xen-overlay.dtsi file which I included in attachment. This is definitely wrong. Can you please also post the full host device tree with your modifications that you are using for Xen and Dom0? You should have something like: timer@ff11 { compatible = "cdns,ttc"; interrupt-parent = <0x2>; interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>; reg = <0x0 0xff11 0x0 0x1000>; timer-width = <0x20>; power-domains = <0x3b>; xen,passthrough; }; For each of the nodes of the devices you are assigning to the DomU. > About patch you sent, I can't find this funcion void vgic_inject_irq in > /xen/arch/arm/vgic.c file, this is link of git repository > from where I build my xen so you can take a look if that printk can be put > somewhere else. > > https://github.com/Xilinx/xen/ It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462 BTW you are using a pretty old branch, I suggest you moving to: https://github.com/Xilinx/xen/tree/xilinx/versal/xen/arch/arm It will work on your board too and it is based on the much newer Xen 4.11. > I ran some more testing and realized that results are the same with or > without vwfi=native, which I think again points out that > passthrough that I need to provide in device tree isn't valid. In reality, the results are the same with and without vwfi=native only if the baremetal app never issues any wfi instructions. > And of course, higher the frequency of interrupts results in higher jitter. > I'm still battling with Xilinx SDK and triple timer > counter that's why I can't figure out what is the exact frequency set (I'm > just rising it and lowering it), I'll give my best to > solve that ASAP because we need to know exact value of frequency set. Yep, that's important :-) > > Thanks in advance! > > Milan > > > > On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini > wrote: > On Thu, 11 Oct 2018, Milan Boberic wrote: > > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > > > > > The jitter may come from Xen or the OS in dom0. > > > It will be useful to know what is the jitter if you run the test on > PetaLinux. > > > (It's understandable the jitter is gone without OS. It is also > common > > > that OS introduces various interferences.) > > > > Hi Meng, > > well... I'm using bare-metal application and I need it exclusively to > > be ran on one CPU as domU (guest) without OS (and I'm not sure how > > would I make the same app to be ran on PetaLinux dom0 :D haha). > > Is there a chance that PetaLinux as dom0 is creating this jitter and > > how? Is there a way of decreasing it? > > > > Yes, there are no prints. > > > > I'm not sure about this timer interrupt passthrough because I didn't > > find any example of it, in attachment I included xen-overlay.dtsi file > > which I edited to add passthrough, in earlier replies there are > > bare-metal configuration file. It would be helpful to know if those > > setting are correct. If they are not correct it would explain the > > jitter. > > > > Thanks in advance, Milan Boberic! > > Hi Milan, > > Sorry for taking so long to go back to this thread. But I am here now :) > > First, let me ask a couple of questions to understand the scenario > better: is there any interference from other virtual machines while you > measure the jitter? Or is the baremetal app the only thing actively > running on the board? > > Second, it would be worth double-checking that Dario's patch to fix > sched=null is not having unexpected side effects. I don't think so, it > would be worth testing with it and without it to be sure. > > I gave a look at your VM configuration. The configuration looks correct. > There is no dtdev settings, but given that none of the devices you are > assigning to the guest does any DMA, it should be OK. You want to make > sure that Dom0 is not trying to use those same devices -- make sure to > add "xen,passthrough;" to each corresponding node on the host device > tree. > > The error messages "No valid vCPU found" are due to the baremetal
Re: [Xen-devel] Xen optimization
Hi, Sorry for the formatting. On Fri, 12 Oct 2018, 17:36 Milan Boberic, wrote: > Hi Stefano, glad to have you back :D, > this is my setup: > - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0 > - there is only one domU and this is my bare-metal app that also > have one vCPU and it's pinned for pCPU1 > so yeah, there is only dom0 and bare-metal app on the board. > > Jitter is the same with and without Dario's patch. > > I'm still not sure about timer's passthrough because there is no mention > of triple timer counter is device tree so I added: > > { >xen,passthrough = <0x1>; > }; > Would you mind to explain what is the triple timer counter? > at the end of the xen-overlay.dtsi file which I included in attachment. > > About patch you sent, I can't find this funcion void vgic_inject_irq in > /xen/arch/arm/vgic.c file, this is link of git repository from where I > build my xen so you can take a look if that printk can be put somewhere > else. > There was some vGIC rework in Xen 4.11. There was also a new vGIC added (selectable using NEW_VGIC). It might be worth to look at it. > https://github.com/Xilinx/xen/ > This is not the official Xen repository and look like patches have been applied on top. I am afraid, I am not going to be able help here. Could you do the same experiment with Xen 4.11? > > I ran some more testing and realized that results are the same with or > without vwfi=native, which I think again points out that passthrough that I > need to provide in device tree isn't valid. > This could also means that wfi is not used by the guest or you never go to the idle vCPU. > And of course, higher the frequency of interrupts results in higher > jitter. I'm still battling with Xilinx SDK and triple timer counter that's > why I can't figure out what is the exact frequency set (I'm just rising it > and lowering it), I'll give my best to solve that ASAP because we need to > know exact value of frequency set. > > Thanks in advance! > > Milan > > > > On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini < > stefano.stabell...@xilinx.com> wrote: > >> On Thu, 11 Oct 2018, Milan Boberic wrote: >> > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: >> > > >> > > The jitter may come from Xen or the OS in dom0. >> > > It will be useful to know what is the jitter if you run the test on >> PetaLinux. >> > > (It's understandable the jitter is gone without OS. It is also common >> > > that OS introduces various interferences.) >> > >> > Hi Meng, >> > well... I'm using bare-metal application and I need it exclusively to >> > be ran on one CPU as domU (guest) without OS (and I'm not sure how >> > would I make the same app to be ran on PetaLinux dom0 :D haha). >> > Is there a chance that PetaLinux as dom0 is creating this jitter and >> > how? Is there a way of decreasing it? >> > >> > Yes, there are no prints. >> > >> > I'm not sure about this timer interrupt passthrough because I didn't >> > find any example of it, in attachment I included xen-overlay.dtsi file >> > which I edited to add passthrough, in earlier replies there are >> > bare-metal configuration file. It would be helpful to know if those >> > setting are correct. If they are not correct it would explain the >> > jitter. >> > >> > Thanks in advance, Milan Boberic! >> >> Hi Milan, >> >> Sorry for taking so long to go back to this thread. But I am here now :) >> >> First, let me ask a couple of questions to understand the scenario >> better: is there any interference from other virtual machines while you >> measure the jitter? Or is the baremetal app the only thing actively >> running on the board? >> >> Second, it would be worth double-checking that Dario's patch to fix >> sched=null is not having unexpected side effects. I don't think so, it >> would be worth testing with it and without it to be sure. >> >> I gave a look at your VM configuration. The configuration looks correct. >> There is no dtdev settings, but given that none of the devices you are >> assigning to the guest does any DMA, it should be OK. You want to make >> sure that Dom0 is not trying to use those same devices -- make sure to >> add "xen,passthrough;" to each corresponding node on the host device >> tree. >> >> The error messages "No valid vCPU found" are due to the baremetal >> applications trying to configure as target cpu for the interrupt cpu1 >> (the second cpu in the system), while actually only 1 vcpu is assigned >> to the VM. Hence, only cpu0 is allowed. I don't think it should cause >> any jitter issues, because the request is simply ignored. Just to be >> safe, you might want to double check that the physical interrupt is >> delivered to the right physical cpu, which would be cpu1 in your >> configuration, the one running the only vcpu of the baremetal app. You >> can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq, >> for example: >> >> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c >> index 5a4f082..208fde7
Re: [Xen-devel] Xen optimization
Hi Stefano, glad to have you back :D, this is my setup: - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0 - there is only one domU and this is my bare-metal app that also have one vCPU and it's pinned for pCPU1 so yeah, there is only dom0 and bare-metal app on the board. Jitter is the same with and without Dario's patch. I'm still not sure about timer's passthrough because there is no mention of triple timer counter is device tree so I added: { xen,passthrough = <0x1>; }; at the end of the xen-overlay.dtsi file which I included in attachment. About patch you sent, I can't find this funcion void vgic_inject_irq in /xen/arch/arm/vgic.c file, this is link of git repository from where I build my xen so you can take a look if that printk can be put somewhere else. https://github.com/Xilinx/xen/ I ran some more testing and realized that results are the same with or without vwfi=native, which I think again points out that passthrough that I need to provide in device tree isn't valid. And of course, higher the frequency of interrupts results in higher jitter. I'm still battling with Xilinx SDK and triple timer counter that's why I can't figure out what is the exact frequency set (I'm just rising it and lowering it), I'll give my best to solve that ASAP because we need to know exact value of frequency set. Thanks in advance! Milan On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini < stefano.stabell...@xilinx.com> wrote: > On Thu, 11 Oct 2018, Milan Boberic wrote: > > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > > > > > The jitter may come from Xen or the OS in dom0. > > > It will be useful to know what is the jitter if you run the test on > PetaLinux. > > > (It's understandable the jitter is gone without OS. It is also common > > > that OS introduces various interferences.) > > > > Hi Meng, > > well... I'm using bare-metal application and I need it exclusively to > > be ran on one CPU as domU (guest) without OS (and I'm not sure how > > would I make the same app to be ran on PetaLinux dom0 :D haha). > > Is there a chance that PetaLinux as dom0 is creating this jitter and > > how? Is there a way of decreasing it? > > > > Yes, there are no prints. > > > > I'm not sure about this timer interrupt passthrough because I didn't > > find any example of it, in attachment I included xen-overlay.dtsi file > > which I edited to add passthrough, in earlier replies there are > > bare-metal configuration file. It would be helpful to know if those > > setting are correct. If they are not correct it would explain the > > jitter. > > > > Thanks in advance, Milan Boberic! > > Hi Milan, > > Sorry for taking so long to go back to this thread. But I am here now :) > > First, let me ask a couple of questions to understand the scenario > better: is there any interference from other virtual machines while you > measure the jitter? Or is the baremetal app the only thing actively > running on the board? > > Second, it would be worth double-checking that Dario's patch to fix > sched=null is not having unexpected side effects. I don't think so, it > would be worth testing with it and without it to be sure. > > I gave a look at your VM configuration. The configuration looks correct. > There is no dtdev settings, but given that none of the devices you are > assigning to the guest does any DMA, it should be OK. You want to make > sure that Dom0 is not trying to use those same devices -- make sure to > add "xen,passthrough;" to each corresponding node on the host device > tree. > > The error messages "No valid vCPU found" are due to the baremetal > applications trying to configure as target cpu for the interrupt cpu1 > (the second cpu in the system), while actually only 1 vcpu is assigned > to the VM. Hence, only cpu0 is allowed. I don't think it should cause > any jitter issues, because the request is simply ignored. Just to be > safe, you might want to double check that the physical interrupt is > delivered to the right physical cpu, which would be cpu1 in your > configuration, the one running the only vcpu of the baremetal app. You > can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq, > for example: > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > index 5a4f082..208fde7 100644 > --- a/xen/arch/arm/vgic.c > +++ b/xen/arch/arm/vgic.c > @@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, > unsigned int virq, > out: > spin_unlock_irqrestore(>arch.vgic.lock, flags); > > +if (v != current) printk("DEBUG irq slow path!\n"); > /* we have a new higher priority irq, inject it into the guest */ > vcpu_kick(v); > > You don't want "DEBUG irq slow path!" to get printed. > > Finally, I would try to set the timer to generate events less frequently > than every 1us and see what happens, maybe every 5-10us. In my tests, > the IRQ latency overhead caused by Xen is around 1us, so injecting 1 > interrupt every 1us, plus 1us of latency caused
Re: [Xen-devel] Xen optimization
On Thu, 11 Oct 2018, Milan Boberic wrote: > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > > > The jitter may come from Xen or the OS in dom0. > > It will be useful to know what is the jitter if you run the test on > > PetaLinux. > > (It's understandable the jitter is gone without OS. It is also common > > that OS introduces various interferences.) > > Hi Meng, > well... I'm using bare-metal application and I need it exclusively to > be ran on one CPU as domU (guest) without OS (and I'm not sure how > would I make the same app to be ran on PetaLinux dom0 :D haha). > Is there a chance that PetaLinux as dom0 is creating this jitter and > how? Is there a way of decreasing it? > > Yes, there are no prints. > > I'm not sure about this timer interrupt passthrough because I didn't > find any example of it, in attachment I included xen-overlay.dtsi file > which I edited to add passthrough, in earlier replies there are > bare-metal configuration file. It would be helpful to know if those > setting are correct. If they are not correct it would explain the > jitter. > > Thanks in advance, Milan Boberic! Hi Milan, Sorry for taking so long to go back to this thread. But I am here now :) First, let me ask a couple of questions to understand the scenario better: is there any interference from other virtual machines while you measure the jitter? Or is the baremetal app the only thing actively running on the board? Second, it would be worth double-checking that Dario's patch to fix sched=null is not having unexpected side effects. I don't think so, it would be worth testing with it and without it to be sure. I gave a look at your VM configuration. The configuration looks correct. There is no dtdev settings, but given that none of the devices you are assigning to the guest does any DMA, it should be OK. You want to make sure that Dom0 is not trying to use those same devices -- make sure to add "xen,passthrough;" to each corresponding node on the host device tree. The error messages "No valid vCPU found" are due to the baremetal applications trying to configure as target cpu for the interrupt cpu1 (the second cpu in the system), while actually only 1 vcpu is assigned to the VM. Hence, only cpu0 is allowed. I don't think it should cause any jitter issues, because the request is simply ignored. Just to be safe, you might want to double check that the physical interrupt is delivered to the right physical cpu, which would be cpu1 in your configuration, the one running the only vcpu of the baremetal app. You can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq, for example: diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 5a4f082..208fde7 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq, out: spin_unlock_irqrestore(>arch.vgic.lock, flags); +if (v != current) printk("DEBUG irq slow path!\n"); /* we have a new higher priority irq, inject it into the guest */ vcpu_kick(v); You don't want "DEBUG irq slow path!" to get printed. Finally, I would try to set the timer to generate events less frequently than every 1us and see what happens, maybe every 5-10us. In my tests, the IRQ latency overhead caused by Xen is around 1us, so injecting 1 interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to good results. I hope this helps, please keep us updated with your results, they are very interesting! ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hey, Be a bit more careful about not top posting, please? :-) On Thu, 2018-10-11 at 14:17 +0200, Milan Boberic wrote: > I misunderstood the passthrough concept, it only allows guest domain > to use certain interrupts and memory. > I'm afraid we totally rely on people with much more experience than me (and I guess Meng's) on how things work on ARM. > Is there are way to somehow > route interrupt from domU (bare-metal app) to hw? > Don't interrupt _come_ from hardware and go/are routed to hypervisor/os/app? Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Hi Milan, On Thu, Oct 11, 2018 at 12:36 AM Milan Boberic wrote: > > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > > > The jitter may come from Xen or the OS in dom0. > > It will be useful to know what is the jitter if you run the test on > > PetaLinux. > > (It's understandable the jitter is gone without OS. It is also common > > that OS introduces various interferences.) > > Hi Meng, > well... I'm using bare-metal application and I need it exclusively to > be ran on one CPU as domU (guest) without OS (and I'm not sure how > would I make the same app to be ran on PetaLinux dom0 :D haha). > Is there a chance that PetaLinux as dom0 is creating this jitter and > how? Is there a way of decreasing it? I'm not familiar with PetaLinux. :( From my previous experience in measuring the rt-test in the virtualization environment, I found: Even though the app. is the only one running on the CPU, the CPU may be used to handle other interrupts and its context (such as TLB and cache) might be flushed by other components. When these happen, the interrupt handling latency can vary a lot. Hopefully, it helps. :) Meng ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
I misunderstood the passthrough concept, it only allows guest domain to use certain interrupts and memory. Is there are way to somehow route interrupt from domU (bare-metal app) to hw? On Thu, Oct 11, 2018 at 9:36 AM Milan Boberic wrote: > > On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > > > The jitter may come from Xen or the OS in dom0. > > It will be useful to know what is the jitter if you run the test on > > PetaLinux. > > (It's understandable the jitter is gone without OS. It is also common > > that OS introduces various interferences.) > > Hi Meng, > well... I'm using bare-metal application and I need it exclusively to > be ran on one CPU as domU (guest) without OS (and I'm not sure how > would I make the same app to be ran on PetaLinux dom0 :D haha). > Is there a chance that PetaLinux as dom0 is creating this jitter and > how? Is there a way of decreasing it? > > Yes, there are no prints. > > I'm not sure about this timer interrupt passthrough because I didn't > find any example of it, in attachment I included xen-overlay.dtsi file > which I edited to add passthrough, in earlier replies there are > bare-metal configuration file. It would be helpful to know if those > setting are correct. If they are not correct it would explain the > jitter. > > Thanks in advance, Milan Boberic! ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
On Wed, Oct 10, 2018 at 6:41 PM Meng Xu wrote: > > The jitter may come from Xen or the OS in dom0. > It will be useful to know what is the jitter if you run the test on PetaLinux. > (It's understandable the jitter is gone without OS. It is also common > that OS introduces various interferences.) Hi Meng, well... I'm using bare-metal application and I need it exclusively to be ran on one CPU as domU (guest) without OS (and I'm not sure how would I make the same app to be ran on PetaLinux dom0 :D haha). Is there a chance that PetaLinux as dom0 is creating this jitter and how? Is there a way of decreasing it? Yes, there are no prints. I'm not sure about this timer interrupt passthrough because I didn't find any example of it, in attachment I included xen-overlay.dtsi file which I edited to add passthrough, in earlier replies there are bare-metal configuration file. It would be helpful to know if those setting are correct. If they are not correct it would explain the jitter. Thanks in advance, Milan Boberic! / { chosen { #address-cells = <2>; #size-cells = <1>; xen,xen-bootargs = "console=dtuart dtuart=serial0 dom0_mem=768M bootscrub=0 dom0_max_vcpus=1 dom0_vcpus_pin=true timer_slop=0 sched=null vwfi=native"; xen,dom0-bootargs = "console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1 clk_ignore_unused"; dom0 { compatible = "xen,linux-zimage", "xen,multiboot-module"; reg = <0x0 0x8 0x310>; }; }; }; { status = "okay"; mmu-masters = < 0x874 0x875 0x876 0x877 _0 0x860 _1 0x861 0x873 _dma_chan1 0x868 _dma_chan2 0x869 _dma_chan3 0x86a _dma_chan4 0x86b _dma_chan5 0x86c _dma_chan6 0x86d _dma_chan7 0x86e _dma_chan8 0x86f _dma_chan1 0x14e8 _dma_chan2 0x14e9 _dma_chan3 0x14ea _dma_chan4 0x14eb _dma_chan5 0x14ec _dma_chan6 0x14ed _dma_chan7 0x14ee _dma_chan8 0x14ef 0x870 0x871 0x872>; }; { xen,passthrough = <0x1>; }; { xen,passthrough = <0x1>; }; { xen,passthrough = <0x1>; }; { xen,passthrough = <0x1>; }; { xen,passthrough = <0x1>; }; { xen,passthrough = <0x1>; };___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
[Just add some thoughts on this.] On Wed, Oct 10, 2018 at 4:22 AM Milan Boberic wrote: > > Hi, > sorry, my explanation wasn't precise and I missed the point. > vCPU pinning with sched=null I put "just in case", because it doesn't hurt. > > Yes, PetaLinux domain is dom0. The jitter may come from Xen or the OS in dom0. It will be useful to know what is the jitter if you run the test on PetaLinux. (It's understandable the jitter is gone without OS. It is also common that OS introduces various interferences.) Another thing you might have already done: make sure there is no print information from either Xen or OS during your experiment. print causes long delay. Meng ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen optimization
Attachments. name = "test" kernel = "timer.bin" memory = 8 vcpus = 1 cpus = [1] irqs = [ 48, 54, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 ] iomem = [ "0xff010,1", "0xff110,1", "0xff120,1", "0xff130,1", "0xff140,1", "0xff0a0,1" ][0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 4.14.0-xilinx-v2018.2 (oe-user@oe-host) (gcc version 7.2.0 (GCC)) #1 SMP Mon Oct 1 16:41:32 CEST 2018 [0.00] Boot CPU: AArch64 Processor [410fd034] [0.00] Machine model: xlnx,zynqmp [0.00] Xen 4.10 support found [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] cma: Reserved 256 MiB at 0x6000 [0.00] On node 0 totalpages: 196608 [0.00] DMA zone: 2688 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 196608 pages, LIFO batch:31 [0.00] psci: probing for conduit method from DT. [0.00] psci: PSCIv1.1 detected in firmware. [0.00] psci: Using standard PSCI v0.2 function IDs [0.00] psci: Trusted OS migration not required [0.00] random: fast init done [0.00] percpu: Embedded 21 pages/cpu @ffc03ffb7000 s46488 r8192 d31336 u86016 [0.00] pcpu-alloc: s46488 r8192 d31336 u86016 alloc=21*4096 [0.00] pcpu-alloc: [0] 0 [0.00] Detected VIPT I-cache on CPU0 [0.00] CPU features: enabling workaround for ARM erratum 845719 [0.00] Built 1 zonelists, mobility grouping on. Total pages: 193920 [0.00] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1 clk_ignore_unused [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes) [0.00] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) [0.00] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) [0.00] Memory: 423788K/786432K available (9980K kernel code, 644K rwdata, 3132K rodata, 512K init, 2168K bss, 100500K reserved, 262144K cma-reserved) [0.00] Virtual kernel memory layout: [0.00] modules : 0xff80 - 0xff800800 ( 128 MB) [0.00] vmalloc : 0xff800800 - 0xffbebfff ( 250 GB) [0.00] .text : 0xff800808 - 0xff8008a4 ( 9984 KB) [0.00] .rodata : 0xff8008a4 - 0xff8008d6 ( 3200 KB) [0.00] .init : 0xff8008d6 - 0xff8008de ( 512 KB) [0.00] .data : 0xff8008de - 0xff8008e81200 ( 645 KB) [0.00].bss : 0xff8008e81200 - 0xff800909f2b0 ( 2169 KB) [0.00] fixed : 0xffbefe7fd000 - 0xffbefec0 ( 4108 KB) [0.00] PCI I/O : 0xffbefee0 - 0xffbeffe0 (16 MB) [0.00] vmemmap : 0xffbf - 0xffc0 ( 4 GB maximum) [0.00] 0xffbf0070 - 0xffbf0188 (17 MB actual) [0.00] memory : 0xffc02000 - 0xffc07000 ( 1280 MB) [0.00] Hierarchical RCU implementation. [0.00] RCU event tracing is enabled. [0.00] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1. [0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] arch_timer: cp15 timer(s) running at 99.99MHz (virt). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0x171015c90f, max_idle_ns: 440795203080 ns [0.03] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511101ns [0.000287] Console: colour dummy device 80x25 [0.283041] console [hvc0] enabled [0.286513] Calibrating delay loop (skipped), value calculated using timer frequency.. 199.99 BogoMIPS (lpj=36) [0.296969] pid_max: default: 32768 minimum: 301 [0.301730] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes) [0.308393] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes) [0.316319] ASID allocator initialised with 65536 entries [0.321502] xen:grant_table: Grant tables using version 1 layout [0.327092] Grant table initialized [0.330637] xen:events: Using FIFO-based ABI [0.334961] Xen: initializing cpu0 [0.338469] Hierarchical SRCU implementation. [0.343130] EFI services will not be available. [0.347423] zynqmp_plat_init Platform Management API v1.0 [0.352852] zynqmp_plat_init Trustzone version v1.0 [0.357828] smp: Bringing up secondary CPUs ... [0.362366] smp: Brought up 1 node, 1 CPU [0.366430] SMP: Total of 1 processors activated. [0.371189] CPU features: detected feature: 32-bit EL0 Support [0.377073] CPU: All CPU(s) started at EL1 [0.381231] alternatives: patching kernel code [0.386133] devtmpfs: initialized [0.392766] clocksource: jiffies: mask: 0x max_cycles:
Re: [Xen-devel] Xen optimization
Hi, sorry, my explanation wasn't precise and I missed the point. vCPU pinning with sched=null I put "just in case", because it doesn't hurt. Yes, PetaLinux domain is dom0. Tested with Credit scheduler before (it was just the LED blink application but anyway), it results with bigger jitter then null scheduler. For example, with Credit scheduler LED blinking result in approximately 3us jitter where with null scheduler there is no jitter. vwfi=native was giving the domain destruction problem which you fixed by sending me patch, approximately 2 weeks ago if you recall :) but I still didn't test it's impact on performance, I will do it ASAP and share results (I think that without vwfi=native jitter will be the same or even bigger). When I say "without Xen", yes, I mean without any OS. Just hardware and this bare-metal app. I do expect latency to be higher in the Xen case and I'm curious how much exactly (which is the point of my work and also master thesis for my faculty :D). Now, the point is that when I set only LED blinking (without timer) in my application there is no jitter (in Xen case) but when I add timer which generates interrupt every us, jitter of 3 us occurs. Timer I use is zynq ultrascale's triple timer counter. I'm suspecting that timer interrupt is creating that jitter. For interrupts I use passthrough in bare-metal application's configuration file (which works for GPIO LED because there is no jitter, interrupt can "freely go" from guest domain directly to GPIO LED). Also, when I create guest domain (which is this bare-metal application) I get this messages: (XEN) printk: 54 messages suppressed. (XEN) d2v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it root@uz3eg-iocc-2018-2:~# (XEN) d2v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ35 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it (XEN) d2v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it In attachments I included dmesg, xl dmesg and bare-metal application's configuration file. Thanks in advance, Milan Boberic. On Tue, Oct 9, 2018 at 6:46 PM Dario Faggioli wrote: > > On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote: > > Hi, > > > Hi Milan, > > > I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with > > carrier card. > > I created bare-metal application in Xilinx SDK. > > In bm application I: > >- start triple timer counter (ttc) which generates > > interrupt every 1us > >- turn on PS LED > >- call function 100 times in for loop (function that sets > > some values) > >- turn off LED > >- stop triple timer counter > >- reset counter value > > > Ok, I'm adding Stefano, Julien, and a couple of other people interested > in RT/lowlat on Xen. > > > I ran this bare-metal application under Xen Hypervisor with following > > settings: > > - used null scheduler (sched=null) and vwfi=native > > - bare-metal application have one vCPU and it is pinned for pCPU1 > > - domain which is PetaLinux also have one vCPU pinned for pCPU0, > > other pCPUs are unused. > > Under Xen Hypervisor I can see 3us jitter on oscilloscope. > > > So, this is probably me not being familiar with Xen on Xilinx (and with > Xen on ARM as a whole), but there's a few things I'm not sure I > understand: > - you say you use sched=null _and_ pinning? That should not be > necessary (although, it shouldn't hurt either) > - "domain which is PetaLinux", is that dom0? > > IAC, if it's not terrible hard to run this kind of test, I'd say, try > without 'vwfi=native', and also with another scheduler, like Credit, > (but then do make sure you use pinning). > > > When I ran same bm application with JTAG from Xilinx SDK (without Xen > > Hypervisor, directly on the board) there is no jitter. > > > Here, when you say "without Xen", do you also mean without any > baremetal OS at all? > > > I'm curios what causes this 3us jitter in Xen (which isn't small > > jitter at all) and is there any way of decreasing it? > > > Right. So, I'm not sure I've understood the test scenario either. But > yeah, 3us jitter seems significant. Still, if we're comparing with > bare-hw, without even an OS at all, I think it could have been expected > for latency and jitter to be higher in the Xen case. > > Anyway, I am not sure anyone has done a kind of analysis that could > help us identify accurately from where things like that come, and in > what proportions. > > It would be
Re: [Xen-devel] Xen optimization
On Tue, 2018-10-09 at 12:59 +0200, Milan Boberic wrote: > Hi, > Hi Milan, > I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with > carrier card. > I created bare-metal application in Xilinx SDK. > In bm application I: >- start triple timer counter (ttc) which generates > interrupt every 1us >- turn on PS LED >- call function 100 times in for loop (function that sets > some values) >- turn off LED >- stop triple timer counter >- reset counter value > Ok, I'm adding Stefano, Julien, and a couple of other people interested in RT/lowlat on Xen. > I ran this bare-metal application under Xen Hypervisor with following > settings: > - used null scheduler (sched=null) and vwfi=native > - bare-metal application have one vCPU and it is pinned for pCPU1 > - domain which is PetaLinux also have one vCPU pinned for pCPU0, > other pCPUs are unused. > Under Xen Hypervisor I can see 3us jitter on oscilloscope. > So, this is probably me not being familiar with Xen on Xilinx (and with Xen on ARM as a whole), but there's a few things I'm not sure I understand: - you say you use sched=null _and_ pinning? That should not be necessary (although, it shouldn't hurt either) - "domain which is PetaLinux", is that dom0? IAC, if it's not terrible hard to run this kind of test, I'd say, try without 'vwfi=native', and also with another scheduler, like Credit, (but then do make sure you use pinning). > When I ran same bm application with JTAG from Xilinx SDK (without Xen > Hypervisor, directly on the board) there is no jitter. > Here, when you say "without Xen", do you also mean without any baremetal OS at all? > I'm curios what causes this 3us jitter in Xen (which isn't small > jitter at all) and is there any way of decreasing it? > Right. So, I'm not sure I've understood the test scenario either. But yeah, 3us jitter seems significant. Still, if we're comparing with bare-hw, without even an OS at all, I think it could have been expected for latency and jitter to be higher in the Xen case. Anyway, I am not sure anyone has done a kind of analysis that could help us identify accurately from where things like that come, and in what proportions. It would be really awesome to have something like that, so do go ahead if you feel like it. :-) I think tracing could help a little (although we don't have a super- sophisticated tracing infrastructure like Linux's perf and such), but sadly enough, that's still not available on ARM, I think. :-/ Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen optimization
Hi, I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with carrier card. I created bare-metal application in Xilinx SDK. In bm application I: - start triple timer counter (ttc) which generates interrupt every 1us - turn on PS LED - call function 100 times in for loop (function that sets some values) - turn off LED - stop triple timer counter - reset counter value I ran this bare-metal application under Xen Hypervisor with following settings: - used null scheduler (sched=null) and vwfi=native - bare-metal application have one vCPU and it is pinned for pCPU1 - domain which is PetaLinux also have one vCPU pinned for pCPU0, other pCPUs are unused. Under Xen Hypervisor I can see 3us jitter on oscilloscope. When I ran same bm application with JTAG from Xilinx SDK (without Xen Hypervisor, directly on the board) there is no jitter. I'm curios what causes this 3us jitter in Xen (which isn't small jitter at all) and is there any way of decreasing it? Also I would gladly accept any suggestion about increasing performance, decreasing jitter, decreasing interrupt latency, etc. Thanks in advance, Milan Boberic. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel