On Thu, 19 Jan 2012, John Baldwin wrote:

On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
rdtsc() may not work on SMP, so I have updated it to use clock_gettime
to get total time.
http://people.freebsd.org/~davidxu/bench/semaphore2/
<http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>

Still, lfence is a lot faster than atomic lock.

I hope it does non-microbenchmarks.  IIRC, jhb found that it was
actually slower in some cases.  I only did micro-benchmarks on Athlon64.

http://www.freebsd.org/~jhb/patches/amd64_fence.patch

This the patch I've had for quite a while.  Can you retest with this?  You'll
probably have to install the updated header in /usr/include as well.

% --- //depot/projects/smpng/sys/amd64/include/atomic.h 2011-01-05 
17:06:25.000000000 0000
% +++ //depot/user/jhb/ktrace/amd64/include/atomic.h    2011-01-05 
22:08:56.000000000 0000
% ...
% @@ -213,13 +213,12 @@
%  #if defined(_KERNEL) && !defined(SMP)
% % /*
% - * We assume that a = b will do atomic loads and stores.  However, on a
% - * PentiumPro or higher, reads may pass writes, so for that case we have
% - * to use a serializing instruction (i.e. with LOCK) to do the load in
% - * SMP kernels.  For UP kernels, however, the cache of the single processor
% + * We assume that a = b will do atomic loads and stores.  However, reads
% + * may pass writes, so we have to use fences in SMP kernels to preserve
% + * ordering.  For UP kernels, however, the cache of the single processor
%   * is always consistent, so we only need to take care of compiler.
%   */
% -#define      ATOMIC_STORE_LOAD(TYPE, LOP, SOP)               \
% +#define      ATOMIC_STORE_LOAD(TYPE)                         \
%  static __inline u_##TYPE                             \
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)         \
%  {                                                    \

It also has some simplifications from removing the use of different
operators for load and store.  These simplifications seem to be not
quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster
than "xchg*" in Athlon64.  The ATOMIC_STORE_LOAD() macro is obfuscatory.
Better to have separate macros for load/store, like we do for
set/clear/add/subtract.  (The latter can be obfuscated even better
using 4 parameters for the ops.  Then better yet by making the type
a parameter.)

% @@ -240,32 +239,22 @@
% % #else /* !(_KERNEL && !SMP) */ % % -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP) \
% +#define      ATOMIC_STORE_LOAD(TYPE)                         \
%  static __inline u_##TYPE                             \
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)         \
%  {                                                    \
% -     u_##TYPE res;                                   \
% +     u_##TYPE v;                                     \
%                                                       \
% -     __asm __volatile(MPLOCKED LOP                   \
% -     : "=a" (res),                 /* 0 */         \
% -       "=m" (*p)                   /* 1 */         \
% -     : "m" (*p)                    /* 2 */         \
% -     : "memory", "cc");                          \
% -                                                     \
% -     return (res);                                   \
% +     v = *p;                                         \
% +     __asm __volatile("lfence" ::: "memory");    \

Style bug (missing spaces around binary operator `:') which becomes
a syntax error for C++.  Other places in this file use ` : : : '.

lfence() should be in cpufunc.h if it can be done separately
like this.  However, I think it can't be done separately --
it needs to be done in the same asm as the load/store, since
separate C statements may be reordered.  This is the same
problem that forces us to write __asm volatile("sti; hlt");
instead of sti(); hlt(); in too many places for idling in
machdep.c.  BTW, sti() and hlt() are bogusly not in cpufunc.h
either:
- I misnamed sti() as disable_intr() since disable_intr() was
  supposed to be MI and I didn't understand that any MI
  interface should not be direct in cpufunc.h.
- someone misnamed hlt() as halt().

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to