Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-09 Thread Tomas Svensson
>>>For reference, note that Linux version avoids __intel_fast_memcpy with >>>-Dmemcpy=__builtin_memcpy, because libirc.a caused griefs when linked >>>into shared library. __intel_fast_memcpy feels as overkill in OpenSSL >>>context and inlined code [movs or unrolled loop] should do better job. >>>Ca

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Andy Polyakov
For reference, note that Linux version avoids __intel_fast_memcpy with -Dmemcpy=__builtin_memcpy, because libirc.a caused griefs when linked into shared library. __intel_fast_memcpy feels as overkill in OpenSSL context and inlined code [movs or unrolled loop] should do better job. Can you try to c

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Andy Polyakov
Hmm. I am sort of jumping into the middle of things here. The question is how portable the code needs to be? If it's using inline assembly, It was *not* using inline assembly, but function which is *known* to be implemented as intrinsic by *a number* of compilers. So it was portable, yet le

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Brian Hurt
On Fri, 8 Jul 2005, Andy Polyakov wrote: Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed to be positive." Question was can you imagine memcpy implementation that would fail to handle overlapping regions when source address is *larger* than destination? Question was *not* if

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Andy Polyakov
Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed to be positive." Question was can you imagine memcpy implementation that would fail to handle overlapping regions when source address is *larger* than destination? Question was *not* if you can imagine memcpy implementation that

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Andy Polyakov
| Some OpenSSL-algorithms are slower on x64, like RSA. SHA1 and RC4 seem to | be faster [...] You said that RSA is slower on AMD64 ? This is not what he said. The claim was that RSA is slower on *EM64T* core and Win64 and this is not surprising. First of all note that 64-bit C implementation

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Richard Levitte
David C. Partridge writes: If you want to copy from one mem location to another even if they overlap *and* preserve the contents, then you should use memmove and pay the overhead of the temporary buffer it probably allocates. Just a note: memmove doesn't need any temporary storage. It just ha

RE: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread David C. Partridge
>Instead of doing what was intended, moving the string up one place, the code has different behaviour. Yes, it will fill the buffer with "H" which is what I would expect to happen - not immediately obvious, but sensible. (any 370 assembler guys will recognise MVC as doing this). If you want to c

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread Marc Bevand
Tomas Svensson wrote: | | Some OpenSSL-algorithms are slower on x64, like RSA. SHA1 and RC4 seem to | be faster [...] Cases where an OpenSSL algorithm is slower on AMD64 than on i386 are almost always due to a substandard AMD64 implementation. For example, some algorithms are written using hand-co

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Tomas Svensson
>> If I use the Intel C++ Compiler 9.0 for EM64T with /O2 or higher, it >> replaces the above memcpy with the optimized function >> __intel_fast_memcpy, >> which breaks DES in OpenSSL. > > For reference, note that Linux version avoids __intel_fast_memcpy with > -Dmemcpy=__builtin_memcpy, because li

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Brian Hurt
On Thu, 7 Jul 2005, Andy Polyakov wrote: 1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, this function relies on undocumented/undefined behavior of memcpy? The original reason for choosing of memcpy

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Andy Polyakov
1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, this function relies on undocumented/undefined behavior of memcpy? The original reason for choosing of memcpy was a) it's comonly inlined by compilers [most

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Brian Hurt
On Thu, 7 Jul 2005, Jack Lloyd wrote: On Thu, Jul 07, 2005 at 07:42:37PM +0200, Andy Polyakov wrote: 1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, this function relies on undocumented/undefined behavi

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Jack Lloyd
On Thu, Jul 07, 2005 at 07:42:37PM +0200, Andy Polyakov wrote: > >1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy > >(ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, > >this function relies on undocumented/undefined behavior of memcpy? > > The original rea

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Andy Polyakov
1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, this function relies on undocumented/undefined behavior of memcpy? The original reason for choosing of memcpy was a) it's comonly inlined by compilers [most

0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-07 Thread Tomas Svensson
Hi, I have some questions/observations: 1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes, this function relies on undocumented/undefined behavior of memcpy? If I use the Intel C++ Compiler 9.0 for EM64T with