On Tue, Dec 29, 2020 at 5:15 PM Michael Weiser wrote:
>
> ...
> Do you (or anybody else) have a hardware arm board for testing, possibly
> with a Cortex A8 or A9 implementation to see how it behaves there?
I've got a Wnadboard/Cortex-A9 and Tinkerboard/Cortex-A17 hanging off
the internet with
Michael Weiser writes:
> It comes out at around seven cycles per block slowdown for chacha-3core
> and five for salsa20-2core. I trace this to vst1.8. It's just slower
> than vstm (in contrast to vldm vs. vld1.32). I managed to save a
> cumulative two cycles by rescheduling instructions so that