Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-16 Thread Shaokun Zhang
Hi Dave, 在 2020/11/16 22:48, Dave Hansen 写道: > On 11/15/20 11:59 PM, Shaokun Zhang wrote: >>> Do you want to take another pass at submitting this patch? >> 'Another pass'? Sorry for my bad understading, I don't follow it correctly. > > Could you please incorporate the feedback that I've given

Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-16 Thread Dave Hansen
On 11/15/20 11:59 PM, Shaokun Zhang wrote: >> Do you want to take another pass at submitting this patch? > 'Another pass'? Sorry for my bad understading, I don't follow it correctly. Could you please incorporate the feedback that I've given about this version of the patch and write a new version?

Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-16 Thread Shaokun Zhang
Hi Dave, 在 2020/11/14 0:02, Dave Hansen 写道: > On 11/12/20 6:06 PM, Shaokun Zhang wrote: On Huawei Kunpeng 920 server, there are 4 NUMA node(0 - 3) in the 2-cpu system(0 - 1). The topology of this server is followed: >>> >>> This is with a feature enabled that Intel calls

Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-13 Thread Dave Hansen
On 11/12/20 6:06 PM, Shaokun Zhang wrote: >>> On Huawei Kunpeng 920 server, there are 4 NUMA node(0 - 3) in the 2-cpu >>> system(0 - 1). The topology of this server is followed: >> >> This is with a feature enabled that Intel calls sub-NUMA-clustering >> (SNC), right? Explaining *that* feature

Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-12 Thread Shaokun Zhang
Hi Dave, 在 2020/11/5 0:10, Dave Hansen 写道: > On 11/3/20 5:39 AM, Shaokun Zhang wrote: >> Currently, Intel DDIO affects only local sockets, so its performance >> improvement is due to the relative difference in performance between the >> local socket I/O and remote socket I/O.To ensure that Intel

Re: [PATCH v6] lib: optimize cpumask_local_spread()

2020-11-04 Thread Dave Hansen
On 11/3/20 5:39 AM, Shaokun Zhang wrote: > Currently, Intel DDIO affects only local sockets, so its performance > improvement is due to the relative difference in performance between the > local socket I/O and remote socket I/O.To ensure that Intel DDIO’s > benefits are available to applications

[PATCH v6] lib: optimize cpumask_local_spread()

2020-11-03 Thread Shaokun Zhang
From: Yuqi Jin In multi-processor and NUMA system, I/O driver will find cpu cores that which shall be bound IRQ. When cpu cores in the local numa have been used, it is better to find the node closest to the local numa node for performance, instead of choosing any online cpu immediately.