Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-17 Thread Ravichandra
er would have to be "no, ARMv4 code can not > used in aarch64 build". But it's also possible to generalize and > consider "this" more like "this thing, Montgomery multiplication module, > you are talking about, is there aarch64 equivalent?" And answer to

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-17 Thread Andy Polyakov
-mont. And then the answer would have to be "no, ARMv4 code can not used in aarch64 build". But it's also possible to generalize and consider "this" more like "this thing, Montgomery multiplication module, you are talking about, is there aarch64 equivalent?" And answe

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-17 Thread Ravichandra
Hi Andy, When using on armv8 architecture, does this mont mul ASM code have any optimization with linux-aarch64 configuration? Thanks Ravichandra On Wed, Jun 17, 2015 at 3:06 PM, Andy Polyakov wrote: > Hi, > > > With some experimentation, it turns out that if I *stop* using the > >

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-17 Thread Andy Polyakov
Hi, > With some experimentation, it turns out that if I *stop* using the > crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the > time for > a simplish test to establish and close a simple SSL connection went from > 28 > seconds to 18. (It's quite a slow

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-16 Thread Jonathan Larmour
On 16/06/15 22:12, Andy Polyakov wrote: With some experimentation, it turns out that if I *stop* using the crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time for a simplish test to establish and close a simple SSL connection went from 28 seconds to

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-16 Thread Andy Polyakov
> What's more, I dug out a Cortex-A9 target (Atmel CycloneV board, operating > with single core only) and got this without armv4-mont.pl: > signverifysign/s verify/s > rsa 2048 bits 0.127342s 0.003628s 7.9275.6 > dsa 2048 bits 0.035971s 0.042778s 27.8 23.4

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-16 Thread Andy Polyakov
>>> With some experimentation, it turns out that if I *stop* using the >>> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time >>> for >>> a simplish test to establish and close a simple SSL connection went from 28 >>> seconds to 18. (It's quite a slow target at any time). >

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-16 Thread Jonathan Larmour
Hi, Thanks for the reply. On 16/06/15 13:09, Andy Polyakov wrote: >> >> With some experimentation, it turns out that if I *stop* using the >> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time >> for >> a simplish test to establish and close a simple SSL connection went f

Re: [openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-16 Thread Andy Polyakov
it just that compilers have become better (I'm only using gcc > 4.7.3, so not bleeding edge even). I don't think so. BIGNUM performance can be delicate balance between multiple factors and it's not impossible to end up on the other side of breaking point. What breaking point? If you e

[openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

2015-06-15 Thread Jonathan Larmour
Hi, After the changes to DH requiring longer key lengths, I switched to 2048-bit keys, but was finding this was now making my test runs on an embedded ARM9 target annoyingly slow; so thought I'd investigate to see if there was anything to improve. With some experimentation, it turns out that if I

Re: Montgomery multiplication on nVidia GPUs

2007-08-29 Thread Chris Rapier
Andy Polyakov wrote: http://netflow.internet2.edu/weekly/20070820/ Given this context I have to admit that I have effectively misused "throughput" term in my posts. I should have written "amount of private key operation requests per time unit" and speculate about its effect on overall ser

Re: Montgomery multiplication on nVidia GPUs

2007-08-29 Thread Andy Polyakov
The obvious application is web serving, which I assume (perhaps naively) accounts for a substantial majority of private decrypts performed using openssl While this data isn't applicable to the internet as a whole it may provide you with some insight into this. http://netflow.internet2.edu/wee

Re: Montgomery multiplication on nVidia GPUs

2007-08-29 Thread Chris Rapier
The obvious application is web serving, which I assume (perhaps naively) accounts for a substantial majority of private decrypts performed using openssl While this data isn't applicable to the internet as a whole it may provide you with some insight into this. http://netflow.internet2.edu/w

Re: Montgomery multiplication on nVidia GPUs

2007-08-29 Thread Eben
> Of course your CPU is a lot slower to perform 2048 signs, but it's a lot faster to perform one. I mean if you simply don't get more than 1 sign request within 240ms and if you insist on always using GPU, you'd have to ask it to perform 1 real and 2047 bogus signs. And so you'll have GPU spending

Re: Montgomery multiplication on nVidia GPUs

2007-08-29 Thread Andy Polyakov
I agree with most of that. However, based on benchmarks on my desktop (a Core 2 Duo E6400) the 32-bit x86 assembler mont exp implementation in OpenSSL seems a _lot_ slower than my GPU. Of course your CPU is a lot slower to perform 2048 signs, but it's a lot faster to perform one. I mean if you

Re: Montgomery multiplication on nVidia GPUs

2007-08-28 Thread Eben
>> I've put together some code to do parallel 512-bit montgomery >> multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these >> per second in 2k batches, >What does "12k in 2k" mean? I mean given that 512 bits is 64 bytes does 2k >mean that

Re: Montgomery multiplication on nVidia GPUs

2007-08-28 Thread Andy Polyakov
I've put together some code to do parallel 512-bit montgomery multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these per second in 2k batches, What does "12k in 2k" mean? I mean given that 512 bits is 64 bytes does 2k mean that you process 32 vectors at once and

Re: Montgomery multiplication on nVidia GPUs

2007-08-27 Thread Eric Fritzges
Eben, I don't have nearly the resources to take it over, but I'd be very curious to hear more about it! Thanks, -Eric Fritzges [EMAIL PROTECTED] On 8/27/2007 5:55 PM Eben spoke thusly: I've put together some code to do parallel 512-bit montgomery multiplication for nVi

Montgomery multiplication on nVidia GPUs

2007-08-27 Thread Eben
I've put together some code to do parallel 512-bit montgomery multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these per second in 2k batches, so enough to do 6k 1024-bit RSA private decrypts. I guess this Christmas's generation of cards (nV9) should push past the 10k mar

Re: Montgomery multiplication

2000-01-28 Thread Ben Laurie
Ulf Möller wrote: > > > > BN_mod_mult_montgomery() first does a full multiplication, then a > > > Montgomery reduction. Would the speedup for RSA etc be significant > > > if we changed that? > > > > I think you are misinterpreting the code! > > Hm, I haven't read the paper cited in the source, b

Re: Montgomery multiplication

2000-01-27 Thread Ulf Möller
> > BN_mod_mult_montgomery() first does a full multiplication, then a > > Montgomery reduction. Would the speedup for RSA etc be significant > > if we changed that? > > I think you are misinterpreting the code! Hm, I haven't read the paper cited in the source, but if you have a look at Algorithm

Re: Montgomery multiplication

2000-01-27 Thread Ben Laurie
Ulf Möller wrote: > > BN_mod_mult_montgomery() first does a full multiplication, then a > Montgomery reduction. Would the speedup for RSA etc be significant > if we changed that? I think you are misinterpreting the code! Cheers, Ben. -- SECURE HOSTING AT THE BUNKER! http://www.thebunker.net/h

Montgomery multiplication

2000-01-26 Thread Ulf Möller
BN_mod_mult_montgomery() first does a full multiplication, then a Montgomery reduction. Would the speedup for RSA etc be significant if we changed that? __ OpenSSL Project http://www.openssl.org Dev