On 21 Jan 2011, at 05:02 , Eduardo Horvath wrote: > On Thu, 20 Jan 2011, Dennis Ferguson wrote: > >> On 20 Jan 2011, at 11:59 , Dennis Ferguson wrote: >>> Is there a way to obtain the correct cache line size for the machine >>> code is running on, both in the kernel and at user level? >> >> I found it. It is "coherency_unit" in the kernel (it is an >> appropriately small number, rather than the cache line size, >> in uniprocessor kernels), but doesn't seem to be exposed outside >> of there. > > What type of machine are we talking here? The powerpc ports have a system > call to tell userland what the CPU's cache line size is. I needed to add > it to support IBM403s (or was it 401s?) which ISTR have a 16-byte cache > line. Look at the libc memcpy/memset code.
Any type of machine, actually, since I'm interested in knowing how to lay out data structures to avoid false sharing in threaded code and the solution for that is the same on pretty much all multiprocessors of any architecture I'm aware of. I know PowerPC's have a different issue related to cache lines, that is they have a cache line prefetch instruction which can be used to write brutally quick memcpy() code if you know the cache line size so you can run the right chunk of code. NetBSD's PowerPC memcpy() doesn't use that, however, and the only special case for that I see is that the 403 compiles the generic .c versions of those functions rather than the (equally-generic) PowerPC .S versions. I know Linux uses a highly optimized PPC memcpy() and that their binaries always have the cache line size available as a global variable for memcpy() and friends to use, so the system call you are thinking of could be in the Linux compat code. Dennis Ferguson