On 24.10.2012 21:30, Alexander Motin wrote:
On 24.10.2012 22:16, Andre Oppermann wrote:
On 24.10.2012 20:56, Jim Harris wrote:
On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd <adr...@freebsd.org>
wrote:
On 24 October 2012 11:36, Jim Harris <jimhar...@freebsd.org> wrote:

   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.

Ok, but..


         struct mtx      tdq_lock;               /* run queue lock. */
+       char            pad[64 - sizeof(struct mtx)];

.. don't we have an existing compile time macro for the cache line
size, which can be used here?

Yes, but I didn't use it for a couple of reasons:

1) struct tdq itself is currently using __aligned(64), so I wanted to
keep it consistent.
2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
NetBurst-based processors having 128-byte cache sectors a while back.
I had planned to start a separate thread on arch@ about this today on
whether this was still appropriate.

See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
     the future possibly change to a different compiler dependent
     align attribute
  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
     automatically gets aligned in all cases, even when dynamically
     allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.

I'm sorry, could you hint me with some theory? I think I can agree that cache 
line sharing can be a
problem in case of spin locks -- waiting thread will constantly try to access 
page modified by other
CPU, that I guess will cause cache line writes to the RAM. But why is it so bad 
to share lock with
respective data in case of non-spin locks? Won't benefits from free regular 
prefetch of the right
data while grabbing lock compensate penalties from relatively rare collisions?

Cliff Click describes it in detail:
 http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends

For a classic mutex it likely doesn't make much difference since the
cache line is exclusive anyway while the lock is held.  On LL/SC systems
there may be cache line dirtying on a failed locking attempt.

For spin mutexes it hurts badly as you noted.

Especially on RW mutexes it hurts because a read lock dirties the cache
line for all other CPU's.  Here the RW mutex should be on its own cache
line in all cases.

--
Andre

_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to