Travis Vitek wrote:
Since we don't have a string perf test that I could find, I wrote up a
quick and dirty one that just made many copies of the same string
repeatedly to exercise the atomic increment/decrement. The results show
a 3% performance penalty when using the newer atomic functions. This
test was run with an 8d configuration, so the atomic functions were
compiled into the stdcxx dll. The test hardware is a Lenovo T60p [Intel
Core 2 T7600 2.33GHz CPU, 2GB RAM].
8d is not thread-safe so the atomic function templates should
be implemented in terms of ordinary increments and decrements
(if they aren't it's a bug). They should only expand to the
atomic assembly (or the Win32 Interlocked) functions in 12X
and 15X build types.
Martin
Old new [patched]
------ 1 threads ------ 1 threads
ms 714 ms 737
ms/op 0.00004256 ms/op 0.00004393
------ 2 threads ------ 2 threads
ms 3911 ms 4024
ms/op 0.00023311 ms/op 0.00023985
------ 4 threads ------ 4 threads
ms 7660 ms 7865
ms/op 0.00045657 ms/op 0.00046879
------ 8 threads ------ 8 threads
ms 15192 ms 15585
ms/op 0.00090551 ms/op 0.00092894
I'm wondering if we used inline assembly for the __rw_atomic_* functions
if the cost would be reduced. We could also evaluate the intrinsic
pragma that is available on MSVC.
Travis
-----Original Message-----
I will do a quick run using the string performance test after lunch.
I'll report the results on that later. I've pasted the source for the
bulk of my test below. If someone wants the entire thing, let me know
and I'll provide everything.
Travis