On 2012/1/19 11:24, David Xu wrote:
On 2012/1/18 23:09, John Baldwin wrote:
On Tuesday, January 17, 2012 9:09:25 pm David Xu wrote:
On 2012/1/17 22:57, John Baldwin wrote:
On Monday, January 16, 2012 1:15:14 am David Xu wrote:
Author: davidxu
Date: Mon Jan 16 06:15:14 2012
New Revision: 230201
URL: http://svn.freebsd.org/changeset/base/230201

Log:
    Insert read memory barriers.
I think using atomic_load_acq() on sem->nwaiters would be clearer as it would indicate which variable you need to ensure is read after other operations. In general I think raw rmb/wmb usage should be avoided when possible as it is
does not describe the programmer's intent as well.

Yes, I had considered that I may use atomic_load_acq(), but at that time, I thought it emits a bus locking, right ? so I just picked up rmb() which
only affects current cpu. maybe atomic_load_acq() does same thing with
rmb() ?
it is still unclear to me.
atomic_load_acq() is the same as rmb().  Right now it uses a locked
instruction on amd64, but it could easily switch to lfence/sfence instead. I had patches to do that but I think bde@ had done some benchmarks that showed
that change made no difference.

I wish there is a version uses lfence for atomic_load_acq(). I always think bus locking is expensive on a multiple-core machine. Here we work on large
machine found that even current rwlock in libthr is not scale well if
most threads are readers, we have to implement CSNZI-like rwlock to avoid
CPU conflict.
http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf

I have just done a benchmark on my notebook which is a 4 SMT sandy bridge
CPU i3 2310m.
http://people.freebsd.org/~davidxu/bench/semaphore/ <http://people.freebsd.org/%7Edavidxu/bench/semaphore/>

The load_acq uses atomic locking is much slower than lfence:
http://people.freebsd.org/~davidxu/bench/semaphore/ministat.txt <http://people.freebsd.org/%7Edavidxu/bench/semaphore/ministat.txt>

benchmark program:
http://people.freebsd.org/~davidxu/bench/semaphore/sem_test.c <http://people.freebsd.org/%7Edavidxu/bench/semaphore/sem_test.c>

rdtsc() may not work on SMP, so I have updated it to use clock_gettime to get total time. http://people.freebsd.org/~davidxu/bench/semaphore2/ <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>

Still, lfence is a lot faster than atomic lock.


_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to