The mental note to take:

If anyone is going to provide binary package for Windows, it would make 
more sense to use ICC instead of MSVC.

Martin

Oliver Smith wrote:
> On 7/22/2010 4:10 AM, Martin Sustrik wrote:
>>> This is a somewhat weak example because the work being done by the
>>> worker is so trivial, but even so on a virtual quad-core machine
>>> building with -O0 I see a 35-40% reduction in processing time.
>>>      
>> Wrker being trivial, the large reduction in processing time is even more
>> impressive.
>>    
> Just to follow up on that, I thought I'd post the findings of my 
> benchmark comparisons of GCC vs the Intel C Compiler, they're kinda 
> impressive:
> 
> Virtual Ubuntu 10.04 guest Machine running under VMWare 7.0 on an i7 
> host under Windows 7 host, 2 virtual cpus with 2 cores each:
> 
> Async-Worker tests with GCC v4.4.3 with -O3 -msse -msse2 -msse3 -mssse3 
> -msse4 -msse4.1 -msse4.2 -mfpmath=sse -mtune=core2 -march=core2:
> (NOTE: I used Acovea to find these optimal settings, I wouldn't 
> ordinarily use -mtune/-march because I always find they make things worse :)
> 
>      ~3580ms for serial RunAndReturn, ~3580 for serial RunAndReturnLocal,
>      ~930ms for parallel RunAndReturn, ~940ms for parallel RunAndReturnLocal
> 
> Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo:
> 
>      ~2590ms for serial RunAndReturn, ~2580ms for serial 
> RunAndReturnLocal, (27% gain)
>      ~700ms for parallel RunAndReturn, ~700ms for parallel 
> RunAndReturnLocal (25% gain)
> 
> Building ZeroMQ with "icpc -O3 -ipo -xHOST" instead of GCC shaved an 
> extra 4-10ms off parallel results.
> 
> Building both Async::Worker examples and ZeroMQ with "icpc -O3 -ipo 
> -xHOST -fbuiltin" reduces benchmark times by upto 50ms.
> 
> Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo 
> -fbuiltin and ZeroMQ compiled with same flags:
> 
>      ~2510ms for serial RunAndReturn, ~2510ms for serial 
> RunAndReturnLocal, (30% gain)
>      ~640ms for parallel RunAndReturn, ~650ms for parallel 
> RunAndReturnLocal (32% gain)
> 
> Given the trivial workloads, these are fairly impressive benchmarks.
> 
> The Intel C++ compiler is dual-licensed, you can download the Linux 
> version free
> 
> http://software.intel.com/en-us/intel-compilers/
> 
> Compared to the Microsoft Visual C++ compiler (2008) we found between 
> 15-50% performance improvements. The 2010 VSCC is significantly 
> improved, but Intel's compiler still produces 10-30% improvements.
> 
> You may be aware there was some controversy over the Intel compiler 
> generating code that didn't work as well on AMD chips: This only 
> occurred when you built "alternate code paths" for SSE instructions etc, 
> and the (9.x) version of the compiler would tend not to use the 
> alternate code paths unless you had an Intel compiler.
> 
> That option is now called "Build Intel specific optimizations", and the 
> alternate code paths now applies fairly to any CPU that claims to have 
> the feature set you are targetting.
> 
> - Oliver
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to