Re: Prefetching and false sharing

2017-02-06 Thread 'Nitsan Wakart' via mechanical-sympathy
I'm a bit late to the party, yes JCTools pads 128, this was based on measurement and was visible on non-NUMA setups. See notes here:http://psy-lob-saw.blogspot.co.za/2013/10/spsc-revisited-part-iii-fastflow-sparse.html Look for the comparison between Y8 and Y83 which compare 2 identical queues

Re: Prefetching and false sharing

2017-01-29 Thread Duarte Nunes
On Mon, Jan 30, 2017 at 12:00 AM Greg Young wrote: > I am not sure how much I would trust an AWS machine in general for > benchmarks > I'll repeat the tests on real hardware, but since I was mainly interested in relative numbers, I think it's okay. > > On Sun, Jan 29,

Re: Prefetching and false sharing

2017-01-29 Thread Greg Young
I am not sure how much I would trust an AWS machine in general for benchmarks On Sun, Jan 29, 2017 at 7:34 PM, Duarte Nunes wrote: > Forgot to mention that I had to use an AWS machine and CPU counters are not > available, so not posting those. > > > On Sunday, January

Re: Prefetching and false sharing

2017-01-29 Thread Duarte Nunes
On Sunday, January 29, 2017 at 8:16:53 PM UTC+1, Vitaly Davidovich wrote: > > This. > > Also, I think the (Intel) adjacent sector prefetch is a feature enabled > through BIOS. I think that will pull the adjacent line to L1, whereas the > spatial prefetcher is probably for streaming accesses

Re: Prefetching and false sharing

2017-01-29 Thread Vitaly Davidovich
This. Also, I think the (Intel) adjacent sector prefetch is a feature enabled through BIOS. I think that will pull the adjacent line to L1, whereas the spatial prefetcher is probably for streaming accesses that are loading L2. Also, I'd run the bench without atomic ops - just relaxed (atomic)

Re: Prefetching and false sharing

2017-01-29 Thread Avi Kivity
You should test with multiple NUMA nodes, or false sharing becomes true sharing at LLC. On 01/29/2017 07:04 PM, Duarte Nunes wrote: Hi all, In the latest Intel optimization manual, we can read in section "2.3.5.4 Data Prefetching": Spatial Prefetcher: This prefetcher strives to

Re: Prefetching and false sharing

2017-01-29 Thread Duarte Nunes
On Sunday, January 29, 2017 at 6:26:14 PM UTC+1, Rajiv Kurian wrote: > > I don't think your code does proper alignment. You malloc the array of > padded_long structs. Malloc does not respect the aligned attribute on > structs as far as I remember. The alignment only works for stack allocated

Re: Prefetching and false sharing

2017-01-29 Thread Rajiv Kurian
I don't think your code does proper alignment. You malloc the array of padded_long structs. Malloc does not respect the aligned attribute on structs as far as I remember. The alignment only works for stack allocated structs AFAIR. Maybe put in an assert on the address to verify alignment. I

Prefetching and false sharing

2017-01-29 Thread Duarte Nunes
Hi all, In the latest Intel optimization manual, we can read in section "2.3.5.4 Data Prefetching": Spatial Prefetcher: This prefetcher strives to complete every cache line > fetched to the L2 cache with the pair line that completes it to a 128-byte > aligned chunk. I take this to mean that