On Sun, Jan 23, 2022 at 4:41 PM Maamoun TK wrote:
>
> On Sun, Jan 23, 2022 at 9:10 PM Niels Möller wrote:
>
> > ni...@lysator.liu.se (Niels Möller) writes:
> >
> > > The current C implementation uses radix 26, and 25 multiplies (32x32
> > > --> 64) per block. And quite a lot of shifts. A radix
On Sun, Jan 23, 2022 at 9:10 PM Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > The current C implementation uses radix 26, and 25 multiplies (32x32
> > --> 64) per block. And quite a lot of shifts. A radix 32 variant
> > analogous to the above would need 16 long
ni...@lysator.liu.se (Niels Möller) writes:
> The current C implementation uses radix 26, and 25 multiplies (32x32
> --> 64) per block. And quite a lot of shifts. A radix 32 variant
> analogous to the above would need 16 long multiplies and 4 short. I'd
> expect that to be faster on most