Re: Old ARM Neon code for salsa20 and chacha

2021-02-06 Thread Michael Weiser
Hello Niels, On Thu, Jan 28, 2021 at 07:26:46PM +0100, Niels Möller wrote: > > With the new 2-way or 3-way functions, performance of the single-block > > functions isn't that critical, so deletion may be ok even if it causes > > some small regression on some processors (e.g., single-block chacha

Re: Old ARM Neon code for salsa20 and chacha

2021-01-28 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > For processors that can issue two instructions per cycle, and with > shorter latency, scalar code (i.e., code using only the general purpose > 32-bit registers) could get more or less the same throughput. The scalar > code also gets the advantage that

Old ARM Neon code for salsa20 and chacha (was: Re: Release of Nettle-3.7?)

2021-01-13 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > I've done a benchmark run of nettle-3.6 on the GMP "nanot2" system, with > a Cortex-A9 processor. The installed compiler is gcc-5.4 (a few years > old). I choose Cortex-A9 for this test in attempt to reproduce my old numbers. Even if it's probably not