On Mon, Jan 30, 2017 at 12:00 AM Greg Young wrote:
> I am not sure how much I would trust an AWS machine in general for
> benchmarks
>
I'll repeat the tests on real hardware, but since I was mainly interested
in relative numbers, I think it's okay.
>
> On Sun, Jan 29,
I am not sure how much I would trust an AWS machine in general for benchmarks
On Sun, Jan 29, 2017 at 7:34 PM, Duarte Nunes wrote:
> Forgot to mention that I had to use an AWS machine and CPU counters are not
> available, so not posting those.
>
>
> On Sunday, January
On Sunday, January 29, 2017 at 8:16:53 PM UTC+1, Vitaly Davidovich wrote:
>
> This.
>
> Also, I think the (Intel) adjacent sector prefetch is a feature enabled
> through BIOS. I think that will pull the adjacent line to L1, whereas the
> spatial prefetcher is probably for streaming accesses
This.
Also, I think the (Intel) adjacent sector prefetch is a feature enabled
through BIOS. I think that will pull the adjacent line to L1, whereas the
spatial prefetcher is probably for streaming accesses that are loading L2.
Also, I'd run the bench without atomic ops - just relaxed (atomic)
You should test with multiple NUMA nodes, or false sharing becomes true
sharing at LLC.
On 01/29/2017 07:04 PM, Duarte Nunes wrote:
Hi all,
In the latest Intel optimization manual, we can read in section
"2.3.5.4 Data Prefetching":
Spatial Prefetcher: This prefetcher strives to
On Sunday, January 29, 2017 at 6:26:14 PM UTC+1, Rajiv Kurian wrote:
>
> I don't think your code does proper alignment. You malloc the array of
> padded_long structs. Malloc does not respect the aligned attribute on
> structs as far as I remember. The alignment only works for stack allocated
I don't think your code does proper alignment. You malloc the array of
padded_long structs. Malloc does not respect the aligned attribute on
structs as far as I remember. The alignment only works for stack allocated
structs AFAIR. Maybe put in an assert on the address to verify alignment. I
Hi all,
In the latest Intel optimization manual, we can read in section "2.3.5.4
Data Prefetching":
Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byte
> aligned chunk.
I take this to mean that