er would have to be "no, ARMv4 code can not
> used in aarch64 build". But it's also possible to generalize and
> consider "this" more like "this thing, Montgomery multiplication module,
> you are talking about, is there aarch64 equivalent?" And answer to
-mont. And then the answer would have to be "no, ARMv4 code can not
used in aarch64 build". But it's also possible to generalize and
consider "this" more like "this thing, Montgomery multiplication module,
you are talking about, is there aarch64 equivalent?" And answe
Hi Andy,
When using on armv8 architecture, does this mont mul ASM code have any
optimization with linux-aarch64 configuration?
Thanks
Ravichandra
On Wed, Jun 17, 2015 at 3:06 PM, Andy Polyakov wrote:
> Hi,
>
> > With some experimentation, it turns out that if I *stop* using the
> >
Hi,
> With some experimentation, it turns out that if I *stop* using the
> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the
> time for
> a simplish test to establish and close a simple SSL connection went from
> 28
> seconds to 18. (It's quite a slow
On 16/06/15 22:12, Andy Polyakov wrote:
With some experimentation, it turns out that if I *stop* using the
crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time
for
a simplish test to establish and close a simple SSL connection went from 28
seconds to
> What's more, I dug out a Cortex-A9 target (Atmel CycloneV board, operating
> with single core only) and got this without armv4-mont.pl:
> signverifysign/s verify/s
> rsa 2048 bits 0.127342s 0.003628s 7.9275.6
> dsa 2048 bits 0.035971s 0.042778s 27.8 23.4
>>> With some experimentation, it turns out that if I *stop* using the
>>> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time
>>> for
>>> a simplish test to establish and close a simple SSL connection went from 28
>>> seconds to 18. (It's quite a slow target at any time).
>
Hi,
Thanks for the reply.
On 16/06/15 13:09, Andy Polyakov wrote:
>>
>> With some experimentation, it turns out that if I *stop* using the
>> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time
>> for
>> a simplish test to establish and close a simple SSL connection went f
it just that compilers have become better (I'm only using gcc
> 4.7.3, so not bleeding edge even).
I don't think so. BIGNUM performance can be delicate balance between
multiple factors and it's not impossible to end up on the other side of
breaking point. What breaking point? If you e
Hi,
After the changes to DH requiring longer key lengths, I switched to 2048-bit
keys, but was finding this was now making my test runs on an embedded ARM9
target annoyingly slow; so thought I'd investigate to see if there was
anything to improve.
With some experimentation, it turns out that if I
Andy Polyakov wrote:
http://netflow.internet2.edu/weekly/20070820/
Given this context I have to admit that I have effectively misused
"throughput" term in my posts. I should have written "amount of private
key operation requests per time unit" and speculate about its effect on
overall ser
The obvious application is web serving, which I assume (perhaps naively)
accounts for a substantial majority of private decrypts performed using
openssl
While this data isn't applicable to the internet as a whole it may
provide you with some insight into this.
http://netflow.internet2.edu/wee
The obvious application is web serving, which I assume (perhaps naively)
accounts for a substantial majority of private decrypts performed using
openssl
While this data isn't applicable to the internet as a whole it may
provide you with some insight into this.
http://netflow.internet2.edu/w
> Of course your CPU is a lot slower to perform 2048 signs, but it's a lot
faster to perform one. I mean if you simply don't get more than 1 sign
request within 240ms and if you insist on always using GPU, you'd have
to ask it to perform 1 real and 2047 bogus signs. And so you'll have GPU
spending
I agree with most of that. However, based on benchmarks on my desktop (a
Core 2 Duo E6400) the 32-bit x86 assembler mont exp implementation in
OpenSSL seems a _lot_ slower than my GPU.
Of course your CPU is a lot slower to perform 2048 signs, but it's a lot
faster to perform one. I mean if you
>> I've put together some code to do parallel 512-bit montgomery
>> multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these
>> per second in 2k batches,
>What does "12k in 2k" mean? I mean given that 512 bits is 64 bytes does 2k
>mean that
I've put together some code to do parallel 512-bit montgomery
multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these per
second in 2k batches,
What does "12k in 2k" mean? I mean given that 512 bits is 64 bytes does
2k mean that you process 32 vectors at once and
Eben,
I don't have nearly the resources to take it over, but I'd be very
curious to hear more about it!
Thanks,
-Eric Fritzges
[EMAIL PROTECTED]
On 8/27/2007 5:55 PM Eben spoke thusly:
I've put together some code to do parallel 512-bit montgomery
multiplication for nVi
I've put together some code to do parallel 512-bit montgomery
multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these per
second in 2k batches, so enough to do 6k 1024-bit RSA private decrypts. I
guess this Christmas's generation of cards (nV9) should push past the 10k
mar
Ulf Möller wrote:
>
> > > BN_mod_mult_montgomery() first does a full multiplication, then a
> > > Montgomery reduction. Would the speedup for RSA etc be significant
> > > if we changed that?
> >
> > I think you are misinterpreting the code!
>
> Hm, I haven't read the paper cited in the source, b
> > BN_mod_mult_montgomery() first does a full multiplication, then a
> > Montgomery reduction. Would the speedup for RSA etc be significant
> > if we changed that?
>
> I think you are misinterpreting the code!
Hm, I haven't read the paper cited in the source, but if you have a
look at Algorithm
Ulf Möller wrote:
>
> BN_mod_mult_montgomery() first does a full multiplication, then a
> Montgomery reduction. Would the speedup for RSA etc be significant
> if we changed that?
I think you are misinterpreting the code!
Cheers,
Ben.
--
SECURE HOSTING AT THE BUNKER! http://www.thebunker.net/h
BN_mod_mult_montgomery() first does a full multiplication, then a
Montgomery reduction. Would the speedup for RSA etc be significant
if we changed that?
__
OpenSSL Project http://www.openssl.org
Dev
23 matches
Mail list logo