because your component's already pretty compact. There is no cache trashing,
since you use every member field. Note in ECS is not about this low level of
granularity where you would split it's direction to different components.
Although that would make sense if you plan to do manual vectorizatio
@demotomohiro thanks, that nails it. So basically my AoS variant flushes 4
times more memory than actually needed. I guess it's safe to conclude that it's
better to not interleave write memory with read memory.
In `testSOA`, only the content of seq `m` need to be written to main memory . I
think in `testAOS`, not only field `m` in object S, but also field `x, y, z`
need to be written to main memory because they are also likely in the cache
line containing field `m`.
According to this wikipedia entry:
I think that your test is very special in that you really use all the fields of
your object struct in your test. I would assume that for a larger object, with
more fields that are not actually used in your tests loops, that fields would
pollute the data cache.
In general SoA is more cache-friendly than AoS and lend itself more to SIMD
optimizations. This is why in video processing, the preferred format is YUV or
YV12 instead of RGB.
Now regarding your code, hardware prefetchers can follow up to 12 forward
streams. It's very easy to catch the pattern
While migrating our [game engine](https://github.com/yglukhov/rod/) to ECS
architecture we went out to validate some simple ideas about CPU caches, and
were a bit surprised. I'm hoping someone can explain what's happening here.
import random, std/monotimes
const MAX = 9000