Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-14 Thread Andi Kleen
On Mon, Apr 14, 2014 at 09:31:16PM +0800, Ling Ma wrote: > The kernel version 3.14 shows memcpy, memset occur 19622 and 14189 > times respectively. > so memset is still important for us, correct? Did you ever see it in a profile log as being hot? I haven't. Static counts don't mean much. -Andi -

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-14 Thread Ling Ma
The kernel version 3.14 shows memcpy, memset occur 19622 and 14189 times respectively. so memset is still important for us, correct? Thanks Ling 2014-04-14 6:03 GMT+08:00, Andi Kleen : > On Sun, Apr 13, 2014 at 11:11:59PM +0800, Ling Ma wrote: >> Any further comments ? > > It would be good to t

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-13 Thread Andi Kleen
On Sun, Apr 13, 2014 at 11:11:59PM +0800, Ling Ma wrote: > Any further comments ? It would be good to test on some more machines that you don't cause regressions. But I'm not aware of any workload that is doing a lot of memset (not counting clear_page). Copies likely matter much more. -Andi --

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-13 Thread Ling Ma
Any further comments ? Thanks Ling 2014-04-08 22:00 GMT+08:00, Ling Ma : > Andi, > > The below is compared result on older machine(cpu info is attached): > That shows new code get better performance up to 1.6x. > > Bytes: ORG_TIME: NEW_TIME: ORG vs NEW: > 7 0.870.761.14 > 16

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-08 Thread Ling Ma
Andi, The below is compared result on older machine(cpu info is attached): That shows new code get better performance up to 1.6x. Bytes: ORG_TIME: NEW_TIME: ORG vs NEW: 7 0.870.761.14 16 0.990.681.45 18 1.070.771.38 21 1.090.781.39 25 1.11

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-07 Thread Andi Kleen
ling.ma.prog...@gmail.com writes: > From: Ling Ma > > In this patch we manage to reduce miss branch prediction by > avoiding using branch instructions and force destination to be aligned > with general 64bit instruction. > Below compared results shows we improve performance up to 1.8x > (We mod

Re: [PATCH RFC] x86:Improve memset with general 64bit instruction

2014-04-07 Thread Ling Ma
Append test suit after tar, run ./test command please. thanks 2014-04-07 22:50 GMT+08:00, ling.ma.prog...@gmail.com : > From: Ling Ma > > In this patch we manage to reduce miss branch prediction by > avoiding using branch instructions and force destination to be aligned > with general 64bit inst