Re: [XenPPC] copy_page speedup using dcbz on target
On Sat, 16 Dec 2006 11:34, Jimi Xenidis wrote: If you really want to explore mem/page copy for XenPPC then you have to understand that since we run without an MMU, profiling code with MMU on, _including_ RMA, is not helpful because the access is guarded ... Please run your experiments _in_ Xen ... Timing code has been included in Xen, setup.c; however, results match prior timings in userspace: JS20: elapsed time: 0xa8f5 elapsed time using dcbz: 0x5410 elapsed time: 0xa987 elapsed time using dcbz: 0x5361 JS21: elapsed time: 0x0862 elapsed time using dcbz: 0x0420 elapsed time: 0x0859 elapsed time using dcbz: 0x0424 ... You will probably find that grouping (as Hollis suggests) by cache line will be much better. but also prefetch the next line somehow. Somewhat better... (following observations were made running in user space) The unrolling the copy loop (by cache line) improves performance a few percent. (did not record the time; also unrolled loop still used same number of registers and no touching) However, including dcbz at beginning of loop slowed things down. Perhaps need to dcbz a couple lines ahead of usage? ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] copy_page speedup using dcbz on target
On Fri, 2006-12-15 at 11:50 -0500, poff wrote: Using dcbz avoids first reading a cache line from memory before writing to the line. Timing results (starting with clean cache, ie no write-backs for dirty lines): So do you have a patch for copy_page()? -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] copy_page speedup using dcbz on target
So do you have a patch for copy_page()? In Xen for PPC, the only copy_page() is in arch/powerpc/mm.c: extern void copy_page(void *dp, void *sp) { if (on_systemsim()) { systemsim_memcpy(dp, sp, PAGE_SIZE); } else { memcpy(dp, sp, PAGE_SIZE); } } 1) Also copy_page is not referenced in current Xen sources? 2) dcbz depends on cacheability and cache alignment. Should a newname be given to this version of copy_page()? 3) Useful when PPC must do page copies in place of 'page flipping'. ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] copy_page speedup using dcbz on target
On Fri, 2006-12-15 at 16:40 -0500, poff wrote: So do you have a patch for copy_page()? In Xen for PPC, the only copy_page() is in arch/powerpc/mm.c: extern void copy_page(void *dp, void *sp) { if (on_systemsim()) { systemsim_memcpy(dp, sp, PAGE_SIZE); } else { memcpy(dp, sp, PAGE_SIZE); } } Correct. 1) Also copy_page is not referenced in current Xen sources? In that case, why are you playing with it? 2) dcbz depends on cacheability and cache alignment. Should a newname be given to this version of copy_page()? page indicates cacheline-aligned. Who calls copy_page() with non-cacheable memory? 3) Useful when PPC must do page copies in place of 'page flipping'. So you're saying we should worry about it later? -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] copy_page speedup using dcbz on target
3) Useful when PPC must do page copies in place of 'page flipping'. So you're saying we should worry about it later? For the future, copy_page using dcbz: diff -r 7669fca80bfc xen/arch/powerpc/mm.c --- a/xen/arch/powerpc/mm.c Mon Dec 04 11:46:53 2006 -0500 +++ b/xen/arch/powerpc/mm.c Fri Dec 15 17:52:58 2006 -0500 @@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp if (on_systemsim()) { systemsim_memcpy(dp, sp, PAGE_SIZE); } else { -memcpy(dp, sp, PAGE_SIZE); + clear_page(dp); + __copy_page(dp, sp); } } diff -r 7669fca80bfc xen/include/asm-powerpc/page.h --- a/xen/include/asm-powerpc/page.hMon Dec 04 11:46:53 2006 -0500 +++ b/xen/include/asm-powerpc/page.hFri Dec 15 17:52:58 2006 -0500 @@ -90,6 +90,25 @@ 1: dcbz0,%0\n\ extern void copy_page(void *dp, void *sp); +static __inline__ void __copy_page(void *dp, void *sp) +{ + ulong dwords, dword_size; + + dword_size = 8; + dwords = (PAGE_SIZE / dword_size) - 1; + + __asm__ __volatile__( + mtctr %2 # copy_page\n\ + ld %2,0(%1)\n\ + std %2,0(%0)\n\ +1: ldu %2,8(%1)\n\ + stdu%2,8(%0)\n\ + bdnz1b + : /* no result */ + : r (dp), r (sp), r (dwords) + : %ctr, memory); +} + #define linear_pg_table linear_l1_table static inline int get_order(unsigned long size) ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] copy_page speedup using dcbz on target
On Fri, 2006-12-15 at 17:50 -0500, poff wrote: 3) Useful when PPC must do page copies in place of 'page flipping'. So you're saying we should worry about it later? For the future, copy_page using dcbz: diff -r 7669fca80bfc xen/arch/powerpc/mm.c --- a/xen/arch/powerpc/mm.c Mon Dec 04 11:46:53 2006 -0500 +++ b/xen/arch/powerpc/mm.c Fri Dec 15 17:52:58 2006 -0500 @@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp if (on_systemsim()) { systemsim_memcpy(dp, sp, PAGE_SIZE); } else { -memcpy(dp, sp, PAGE_SIZE); + clear_page(dp); + __copy_page(dp, sp); } } diff -r 7669fca80bfc xen/include/asm-powerpc/page.h --- a/xen/include/asm-powerpc/page.h Mon Dec 04 11:46:53 2006 -0500 +++ b/xen/include/asm-powerpc/page.h Fri Dec 15 17:52:58 2006 -0500 @@ -90,6 +90,25 @@ 1: dcbz0,%0\n\ extern void copy_page(void *dp, void *sp); +static __inline__ void __copy_page(void *dp, void *sp) +{ + ulong dwords, dword_size; + + dword_size = 8; + dwords = (PAGE_SIZE / dword_size) - 1; + + __asm__ __volatile__( + mtctr %2 # copy_page\n\ + ld %2,0(%1)\n\ + std %2,0(%0)\n\ +1: ldu %2,8(%1)\n\ + stdu%2,8(%0)\n\ + bdnz1b + : /* no result */ + : r (dp), r (sp), r (dwords) + : %ctr, memory); +} + I'd rather have copy_page() dcbz; stdu; stdu; stdu; ... stdu; in each loop iteration. It would also be nice to improve memcpy, though that one is certainly more difficult due to alignment, varying lengths, etc. Perhaps we can borrow code from http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel