Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-18 Thread poff
On Sat, 16 Dec 2006 11:34, Jimi Xenidis wrote:

 If you really want to explore mem/page copy for XenPPC then you have  
 to understand that since we run without an MMU, profiling code with  
 MMU on, _including_ RMA, is not helpful because the access is guarded ... 

 Please run your experiments _in_ Xen ...

Timing code has been included in Xen, setup.c; 
however, results match prior timings in userspace:

JS20:
elapsed time: 0xa8f5
elapsed time using dcbz: 0x5410

elapsed time: 0xa987
elapsed time using dcbz: 0x5361


JS21:
elapsed time: 0x0862
elapsed time using dcbz: 0x0420

elapsed time: 0x0859
elapsed time using dcbz: 0x0424

...

 You will probably find that grouping (as Hollis suggests) by cache  
 line will be much better. but also prefetch the next line somehow.

Somewhat better... (following observations were made running in user space)
The unrolling the copy loop (by cache line) improves performance a few percent.
(did not record the time; also unrolled loop still used same number of registers
and no touching)

However, including dcbz at beginning of loop slowed things down. Perhaps need to
dcbz a couple lines ahead of usage?

___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-15 Thread Hollis Blanchard
On Fri, 2006-12-15 at 11:50 -0500, poff wrote:
 Using dcbz avoids first reading a cache line from memory before
 writing to the line.
 Timing results (starting with clean cache, ie no write-backs for dirty
 lines):

So do you have a patch for copy_page()?

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-15 Thread poff
 So do you have a patch for copy_page()?

In Xen for PPC, the only copy_page() is in arch/powerpc/mm.c:

extern void copy_page(void *dp, void *sp)
{
if (on_systemsim()) {
systemsim_memcpy(dp, sp, PAGE_SIZE);
} else {
memcpy(dp, sp, PAGE_SIZE);
}
}


1) Also copy_page is not referenced in current Xen sources?

2) dcbz depends on cacheability and cache alignment.
   Should a newname be given to this version of copy_page()?

3) Useful when PPC must do page copies in place of 'page flipping'.

___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-15 Thread Hollis Blanchard
On Fri, 2006-12-15 at 16:40 -0500, poff wrote:
  So do you have a patch for copy_page()?
 
 In Xen for PPC, the only copy_page() is in arch/powerpc/mm.c:
 
 extern void copy_page(void *dp, void *sp)
 {
 if (on_systemsim()) {
 systemsim_memcpy(dp, sp, PAGE_SIZE);
 } else {
 memcpy(dp, sp, PAGE_SIZE);
 }
 }

Correct.

 1) Also copy_page is not referenced in current Xen sources?

In that case, why are you playing with it?

 2) dcbz depends on cacheability and cache alignment.
Should a newname be given to this version of copy_page()?

page indicates cacheline-aligned.

Who calls copy_page() with non-cacheable memory?

 3) Useful when PPC must do page copies in place of 'page flipping'.

So you're saying we should worry about it later?

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-15 Thread poff
  3) Useful when PPC must do page copies in place of 'page flipping'.
 
 So you're saying we should worry about it later?


For the future, copy_page using dcbz:

diff -r 7669fca80bfc xen/arch/powerpc/mm.c
--- a/xen/arch/powerpc/mm.c Mon Dec 04 11:46:53 2006 -0500
+++ b/xen/arch/powerpc/mm.c Fri Dec 15 17:52:58 2006 -0500
@@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp
 if (on_systemsim()) {
 systemsim_memcpy(dp, sp, PAGE_SIZE);
 } else {
-memcpy(dp, sp, PAGE_SIZE);
+   clear_page(dp);
+   __copy_page(dp, sp);
 }
 }
 
diff -r 7669fca80bfc xen/include/asm-powerpc/page.h
--- a/xen/include/asm-powerpc/page.hMon Dec 04 11:46:53 2006 -0500
+++ b/xen/include/asm-powerpc/page.hFri Dec 15 17:52:58 2006 -0500
@@ -90,6 +90,25 @@ 1:  dcbz0,%0\n\
 
 extern void copy_page(void *dp, void *sp);
 
+static __inline__ void __copy_page(void *dp, void *sp)
+{
+   ulong dwords, dword_size;
+
+   dword_size = 8;
+   dwords = (PAGE_SIZE / dword_size) - 1;
+
+   __asm__ __volatile__(
+   mtctr  %2  # copy_page\n\
+   ld  %2,0(%1)\n\
+   std %2,0(%0)\n\
+1: ldu %2,8(%1)\n\
+   stdu%2,8(%0)\n\
+   bdnz1b
+   : /* no result */
+   : r (dp), r (sp), r (dwords)
+   : %ctr, memory);
+}
+
 #define linear_pg_table linear_l1_table
 
 static inline int get_order(unsigned long size)


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] copy_page speedup using dcbz on target

2006-12-15 Thread Hollis Blanchard
On Fri, 2006-12-15 at 17:50 -0500, poff wrote:
   3) Useful when PPC must do page copies in place of 'page flipping'.
 
  So you're saying we should worry about it later?
 
 
 For the future, copy_page using dcbz:
 
 diff -r 7669fca80bfc xen/arch/powerpc/mm.c
 --- a/xen/arch/powerpc/mm.c   Mon Dec 04 11:46:53 2006 -0500
 +++ b/xen/arch/powerpc/mm.c   Fri Dec 15 17:52:58 2006 -0500
 @@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp
  if (on_systemsim()) {
  systemsim_memcpy(dp, sp, PAGE_SIZE);
  } else {
 -memcpy(dp, sp, PAGE_SIZE);
 + clear_page(dp);
 + __copy_page(dp, sp);
  }
  }
 
 diff -r 7669fca80bfc xen/include/asm-powerpc/page.h
 --- a/xen/include/asm-powerpc/page.h  Mon Dec 04 11:46:53 2006 -0500
 +++ b/xen/include/asm-powerpc/page.h  Fri Dec 15 17:52:58 2006 -0500
 @@ -90,6 +90,25 @@ 1:  dcbz0,%0\n\
 
  extern void copy_page(void *dp, void *sp);
 
 +static __inline__ void __copy_page(void *dp, void *sp)
 +{
 + ulong dwords, dword_size;
 +
 + dword_size = 8;
 + dwords = (PAGE_SIZE / dword_size) - 1;
 +
 + __asm__ __volatile__(
 + mtctr  %2  # copy_page\n\
 + ld  %2,0(%1)\n\
 + std %2,0(%0)\n\
 +1:   ldu %2,8(%1)\n\
 + stdu%2,8(%0)\n\
 + bdnz1b
 + : /* no result */
 + : r (dp), r (sp), r (dwords)
 + : %ctr, memory);
 +}
 +

I'd rather have copy_page() dcbz; stdu; stdu; stdu; ... stdu; in each
loop iteration.

It would also be nice to improve memcpy, though that one is certainly
more difficult due to alignment, varying lengths, etc. Perhaps we can
borrow code from
http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel