Maamoun TK <maamoun...@googlemail.com> writes:

> I got almost 12% speedup of optimizing the sha3_permute() function using
> the SHA hardware accelerator of s390x, is it worth adding that assembly
> implementation?

For such a small assembly function, I think it's worth the effort (more
questionable if it was worth adding the special instructions for it...).

If you have the time, you could also try out doing it with vector
registers, like on x86_64 and arm/neon. Some difficulties in the x86_64
implementation were (i) xmm register shortage, (ii) moving 64-bit pieces
between the 128-bit xmm registers, and (iii) rotating the 64-bit pieces
of an xmm register by different shift counts.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to