Maamoun TK <maamoun...@googlemail.com> writes:

> I think the design could be as simple as always padding each block with
> 0x01 in _nettle_poly1305_update while keeping _nettle_poly1305_block that
> is responsible for processing last block takes variable padding values (0
> or 1). I committed an update in
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/48 that applies
> that design.

I've tried out this refactoring on its own branch. There's a new
_nettle_poly1305_update, in C only, which deals with partial blocks and
is called from both poly1305-aes and chacha-poly1305.

It calls a new function _nettle_poly1305_blocks, with the interface we
have been discussing. And I've implemented the new function for x86_64.
Conclusions,

1. There's some code duplication between _block and _blocks, which seems
   hard to avoid (but *maybe* some m4 macrology for shared logic could
   be a good idea).

2. When benchmarking on my laptop, it's 70% (!) faster. I had expected
   only a minor improvement, and I'm not yet convinced it's too good to
   be true, and tests need improvement. The numbers I get is a speed
   increase from 3 GB/s to 5 GB/s for the poly1305 update function, or
   44 cycles/block reduced to 25.

   If this improvement is real, my best explanation is that avoiding
   load and store of the state between iterations makes out-of-order
   execution across iterations work a *lot* better, e.g., letting the next
   iteration's multiplies involving H0 and H1 start in parallel with the
   final imul that H2 depends on.

See https://git.lysator.liu.se/nettle/nettle/-/commits/refactor-poly1305.

At the moment, the new _blocks method is not optional. It could be made
optional with a bit configure hacking, but given the promising (and
surprising, at least to me) results on x86_64, I think it would be good
to try out adding it for ppc as well to see if it brings a small or
large improvement. Do you already have multiblock radix-2^64 code in
your merge request, or only the new radix 2^44 variant?

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

Reply via email to