Related to change 18278[1], I was wondering if there is really a benefit of 
dealing with 128-byte cachelines like we do today.
Compiling VPP with cacheline size set to 128 will basically just add 64 bytes 
of unused space at the end of each cacheline so
vlib_buffer_t for example will grow from 128 bytes to 256 bytes, but we will 
still need to prefetch 2 cachelines like we do by default.

Whta will happen if we just leave that to be 64?

1. sometimes (and not very frequently) we will issue 2 prefetch instructions 
for same cacheline, but I hope hardware is smart enough to just ignore 2nd one

2. we may face false sharing issues if first 64 bytes is touched by one thread 
and another 64 bytes are touched by another one

Second one sounds to me like a real problem, but it can be solved by aligning 
all per-thread data structures to 2 x cacheline size.
Actually If i remember correctly, even on x86 some of hardware prefetchers are 
dealing with blocks of 2 cachelines.

So unless I missed something, my proposal here is, instead of maintaining 
special 128 byte images for some ARM64 machines,
let’s just align all per-thread data structures to 128 and have just one ARM 
image.

Thoughts?

-- 
Damjan


[1] https://gerrit.fd.io/r/#/c/18278/

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12532): https://lists.fd.io/g/vpp-dev/message/12532
Mute This Topic: https://lists.fd.io/mt/30426937/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to