Re: Improve spinlock performance by moving work to one core

2015-12-06 Thread Ling Ma
Longman, We further optimized the kernel spinlock in ali-spin-lock.patch as attachment based on kernel 4.3.0-rc4. Run thread.c in user space with kernel patch(ali-spin-lock.patch) on E5-2699v3, compare with original spinlock: The printed data indicates the performance in critical path is

Re: Improve spinlock performance by moving work to one core

2015-12-06 Thread Ling Ma
Longman, We further optimized the kernel spinlock in ali-spin-lock.patch as attachment based on kernel 4.3.0-rc4. Run thread.c in user space with kernel patch(ali-spin-lock.patch) on E5-2699v3, compare with original spinlock: The printed data indicates the performance in critical path is

Re: Improve spinlock performance by moving work to one core

2015-11-30 Thread Waiman Long
On 11/30/2015 01:17 AM, Ling Ma wrote: Any comments, the patch is acceptable ? Thanks Ling Ling, The core idea of your current patch hasn't changed from your previous patch. My comment is that you should not attempt to sell it as a replacement of the current spinlock mechanism. I just

Re: Improve spinlock performance by moving work to one core

2015-11-30 Thread Waiman Long
On 11/30/2015 01:17 AM, Ling Ma wrote: Any comments, the patch is acceptable ? Thanks Ling Ling, The core idea of your current patch hasn't changed from your previous patch. My comment is that you should not attempt to sell it as a replacement of the current spinlock mechanism. I just

Re: Improve spinlock performance by moving work to one core

2015-11-29 Thread Ling Ma
Any comments, the patch is acceptable ? Thanks Ling 2015-11-26 17:00 GMT+08:00 Ling Ma : > Run thread.c with clean kernel 4.3.0-rc4, perf top -G also indicates > cache_flusharray and cache_alloc_refill functions spend 25.6% time > on queued_spin_lock_slowpath totally. it means the compared data

Re: Improve spinlock performance by moving work to one core

2015-11-29 Thread Ling Ma
Any comments, the patch is acceptable ? Thanks Ling 2015-11-26 17:00 GMT+08:00 Ling Ma : > Run thread.c with clean kernel 4.3.0-rc4, perf top -G also indicates > cache_flusharray and cache_alloc_refill functions spend 25.6% time > on queued_spin_lock_slowpath totally.

Re: Improve spinlock performance by moving work to one core

2015-11-26 Thread Ling Ma
Run thread.c with clean kernel 4.3.0-rc4, perf top -G also indicates cache_flusharray and cache_alloc_refill functions spend 25.6% time on queued_spin_lock_slowpath totally. it means the compared data from our spinlock-test.patch is reliable. Thanks Ling 2015-11-26 11:49 GMT+08:00 Ling Ma : >

Re: Improve spinlock performance by moving work to one core

2015-11-26 Thread Ling Ma
Run thread.c with clean kernel 4.3.0-rc4, perf top -G also indicates cache_flusharray and cache_alloc_refill functions spend 25.6% time on queued_spin_lock_slowpath totally. it means the compared data from our spinlock-test.patch is reliable. Thanks Ling 2015-11-26 11:49 GMT+08:00 Ling Ma

Re: Improve spinlock performance by moving work to one core

2015-11-25 Thread Ling Ma
Hi Longman, All compared data is from the below operation in spinlock-test.patch: +#if ORG_QUEUED_SPINLOCK + org_queued_spin_lock((struct qspinlock *)>list_lock); + refill_fn(); + org_queued_spin_unlock((struct qspinlock *)>list_lock); +#else + new_spin_lock((struct

Re: Improve spinlock performance by moving work to one core

2015-11-25 Thread Waiman Long
On 11/23/2015 04:41 AM, Ling Ma wrote: > Hi Longman, > > Attachments include user space application thread.c and kernel patch > spinlock-test.patch based on kernel 4.3.0-rc4 > > we run thread.c with kernel patch, test original and new spinlock > respectively, > perf top -G indicates thread.c

Re: Improve spinlock performance by moving work to one core

2015-11-25 Thread Ling Ma
Hi Longman, All compared data is from the below operation in spinlock-test.patch: +#if ORG_QUEUED_SPINLOCK + org_queued_spin_lock((struct qspinlock *)>list_lock); + refill_fn(); + org_queued_spin_unlock((struct qspinlock *)>list_lock); +#else + new_spin_lock((struct

Re: Improve spinlock performance by moving work to one core

2015-11-25 Thread Waiman Long
On 11/23/2015 04:41 AM, Ling Ma wrote: > Hi Longman, > > Attachments include user space application thread.c and kernel patch > spinlock-test.patch based on kernel 4.3.0-rc4 > > we run thread.c with kernel patch, test original and new spinlock > respectively, > perf top -G indicates thread.c

Re: Improve spinlock performance by moving work to one core

2015-11-24 Thread Ling Ma
Any comments about it ? Thanks Ling 2015-11-23 17:41 GMT+08:00 Ling Ma : > Hi Longman, > > Attachments include user space application thread.c and kernel patch > spinlock-test.patch based on kernel 4.3.0-rc4 > > we run thread.c with kernel patch, test original and new spinlock > respectively, >

Re: Improve spinlock performance by moving work to one core

2015-11-24 Thread Ling Ma
Any comments about it ? Thanks Ling 2015-11-23 17:41 GMT+08:00 Ling Ma : > Hi Longman, > > Attachments include user space application thread.c and kernel patch > spinlock-test.patch based on kernel 4.3.0-rc4 > > we run thread.c with kernel patch, test original and new

Re: Improve spinlock performance by moving work to one core

2015-11-23 Thread Ling Ma
Hi Longman, Attachments include user space application thread.c and kernel patch spinlock-test.patch based on kernel 4.3.0-rc4 we run thread.c with kernel patch, test original and new spinlock respectively, perf top -G indicates thread.c cause cache_alloc_refill and cache_flusharray functions to

Re: Improve spinlock performance by moving work to one core

2015-11-23 Thread Ling Ma
Hi Longman, Attachments include user space application thread.c and kernel patch spinlock-test.patch based on kernel 4.3.0-rc4 we run thread.c with kernel patch, test original and new spinlock respectively, perf top -G indicates thread.c cause cache_alloc_refill and cache_flusharray functions to

Re: Improve spinlock performance by moving work to one core

2015-11-06 Thread Waiman Long
On 11/05/2015 11:28 PM, Ling Ma wrote: Longman Thanks for your suggestion. We will look for real scenario to test, and could you please introduce some benchmarks on spinlock ? Regards Ling The kernel has been well optimized for most common workloads that spinlock contention is usually not

Re: Improve spinlock performance by moving work to one core

2015-11-06 Thread Waiman Long
On 11/05/2015 11:28 PM, Ling Ma wrote: Longman Thanks for your suggestion. We will look for real scenario to test, and could you please introduce some benchmarks on spinlock ? Regards Ling The kernel has been well optimized for most common workloads that spinlock contention is usually not

Re: Improve spinlock performance by moving work to one core

2015-11-05 Thread Ling Ma
Longman Thanks for your suggestion. We will look for real scenario to test, and could you please introduce some benchmarks on spinlock ? Regards Ling > > Your new spinlock code completely change the API and the semantics of the > existing spinlock calls. That requires changes to thousands of

Re: Improve spinlock performance by moving work to one core

2015-11-05 Thread Ling Ma
Longman Thanks for your suggestion. We will look for real scenario to test, and could you please introduce some benchmarks on spinlock ? Regards Ling > > Your new spinlock code completely change the API and the semantics of the > existing spinlock calls. That requires changes to thousands of

Re: Improve spinlock performance by moving work to one core

2015-11-04 Thread Ling Ma
Hi All, (send again for linux-kernel@vger.kernel.org) Spinlock caused cache line ping-pong between cores, we have to spend lots of time to get serialized execution. However if we present the serialized work to one core, it will help us save much time. In the attachment we changed code based on

Re: Improve spinlock performance by moving work to one core

2015-11-04 Thread Ling Ma
Hi All, (send again for linux-kernel@vger.kernel.org) Spinlock caused cache line ping-pong between cores, we have to spend lots of time to get serialized execution. However if we present the serialized work to one core, it will help us save much time. In the attachment we changed code based on