[openssl-dev] [openssl.org #4032] [PATCH] Fast 1536-bit modular exponentiation with the new VPMADD52 instructions

2015-09-14 Thread Gueron, Shay via RT
Hello everyone, This patch is a contribution to OpenSSL. It extends the patch "Id 3590" (from Nov 04, 2014; by Gueron and Krasnov) entitled "Fast modular exponentiation with the new VPMADD52 instructions". This contribution includes 1536-bit modular exponentiation (constant time) with the RSA

[openssl-dev] [openssl.org #3850] [PATCH] Improved performance Multi Block CBC-SHA1 and CBC-SHA256

2015-05-14 Thread Gueron, Shay via RT
Hello all, This patch is a contribution to OpenSSL. It concerns the Multi Block (MB) CBC SHA1/SHA256 implementations (the function "tls1_1_multi_block_encrypt" in "e_aes_cbc_hmac_sha1.c" and "e_aes_cbc_hmac_sha256.c"). The patch addresses a slow derivation of the multiple random IV's for the CBC

[openssl-dev] [openssl.org #3810] [PATCH] Improved P256 ECC performance by means of a dedicated function for modular inversion modulo the P256 group order

2015-04-17 Thread Gueron, Shay via RT
Hello all, This patch is a contribution to OpenSSL. It concerns the P256 ECC implementation. The patch improves upon our previous submission, by providing a dedicated function to perform modular inversion modulo the P256 group order. Results: The performance improvements, for single threaded a

[openssl.org #3590] [PATCH] Fast modular exponentiation with the new VPMADD52 instructions

2014-11-04 Thread Gueron, Shay via RT
Hello everyone, The is an OpenSSL patch with functions that use the new VPMADD52 instructions (VPMADD52LUQ and VPMADD52HUQ) announced in https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf (see also the Intel(r) Software Development Emulator at https://software.intel.com/e

RE: [openssl.org #3113] OpenSSL’s DH implementation uses an unnecessarily long exponent, leading to significant performance loss

2014-08-27 Thread Gueron, Shay via RT
Hello Rich, I would recommend to do that. Otherwise there will be "unsuspecting users" who will (unintentionally) use the long exponent ...for example, this is what happened to me in the first attempts, and I did not understand why it was so slow :)... It does not really cost anything signifi

RE: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-12 Thread Gueron, Shay via RT
Do you have any comment from Intel on the concerns regarding the scattering technique (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)? First, a comment: it is difficult to actually understand the precise claim by the authors, from these 6 slides. The code snippet

RE: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-29 Thread Gueron, Shay via RT
Thanks you Bodo, for the comments. Here are some quick answers >>> It seems that the BN_MONT_CTX-related code The optimization made for the computation of the modular inverse in the ECDSA sigh, is using const-time mod-exp. Indeed, this is independent of the rest of the patch, and it can be use

[openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-22 Thread Gueron, Shay via RT
Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of EC multiplication for NIST P256 curve. It accelerated ECDSA (sign and verify) as well as ECDH (compute and generate key), for the P256 curve. The implementation is based on Montgomery

[openssl.org #3113] OpenSSL’s DH implementation uses an unnecessarily long exponent, leading to significant performance loss

2013-08-20 Thread Gueron, Shay via RT
Hello all, OpenSSL’s DH implementation uses an unnecessarily long exponent, leading to significant performance loss OpenSSL handles the Diffie Hellman (DH) protocol in a very conservative way. By default, the length of the private key equals to the bit-length of the prime modulus. For example

RE: [openssl.org #3054] AutoReply: [PATCH] Efficient and side channel analysis resistant 1024-bit and 2048-bit modular exponentiation, optimizing RSA, DSA and DH of compatible sizes, for AVX2 capable

2013-06-09 Thread Gueron, Shay via RT
--- Performance numbers update for [openssl.org #3054] --- Following the recent release of the 4th Generation Intel® Core™ processor family, we provide performance numbers for the patch. All measurements were carried out on an Core™ i7-4770K processor, running at 3.5 GHz, with Turbo boost and

[openssl.org #3054] [PATCH] Efficient and side channel analysis resistant 1024-bit and 2048-bit modular exponentiation, optimizing RSA, DSA and DH of compatible sizes, for AVX2 capable x86_64 platform

2013-05-27 Thread Gueron, Shay via RT
Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of 1024-bit and 2048-bit Modular Exponentiation. When the patch is applied to the OpenSSL library, it accelerates RSA1024 (verify), RSA2048 (verify and sign), DSA1024 (verify and sign),

[openssl.org #3042] [PATCH] Fast implementation of AES-XTS mode for AVX capable x86-64 processors

2013-05-06 Thread Gueron, Shay via RT
Hello all - This patch is a contribution to OpenSSL. It demonstrates an efficient implementation of AES-XTS, using Intel's AES-NI and AVX architecture. The performance improvement provided here is achieved via: a slightly improved reduction technique, encryption of several (here, 8) blocks i

[openssl.org #3021] [PATCH] Fast implementation of AES-CTR mode for AVX capable x86-64 processors

2013-03-22 Thread Gueron, Shay via RT
Hello all - This patch is a contribution to OpenSSL. It offers an efficient implementation of AES-CTR, using Intel's AES-NI and AVX architecture. This contribution also improves the performance of AES-GCM. While faster AES-GCM can be achieved by interleaving the CTR and GHASH, we understand fr