Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
Just to add to my total confusion about the totally disparate performance numbers we're seeing, I did some benchmarks on other machines. The speedup isn't as good one-pass as it is iterated, and as I mentioned it's slower on a P4, but it's not 7 times slower by any stretch. There are all

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Wed, Jun 11, 2014 at 08:32:49PM -0400, George Spelvin wrote: > Comparable, but slightly slower. Clearly, I need to do better. > And you can see the first-iteration effects clearly. Still, > noting *remotely* like 7x! I redid my numbers, and I can no longer reproduce the 7x slowdown. I do

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
> Sadly I can't find the tree, but I'm 94% sure it was Skein-256 > (specifically the SHA3-256 candidate parameter set.) It would be nice to have two hash functions, optimized separately for 32- and 64-bit processors. As the Skein report says, the algorithm can be adapted to 32 bits easily

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 01:41 PM, H. Peter Anvin wrote: > On 06/11/2014 12:25 PM, Theodore Ts'o wrote: >> On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: >>> While talking about performance, I did a quick prototype of random using >>> Skein instead of SHA-1, and it was measurably faster, in

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
> ... but how did you measure the "2/3 the time"? I've done some > measurements, using both "time calling fast_mix() and fast_mix2() N > times and divide by N (where N needs to be quite large). Using that > metric, fast_mix2() takes seven times as long to run. Wow, *massively* different

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 12:25 PM, Theodore Ts'o wrote: > On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: >> While talking about performance, I did a quick prototype of random using >> Skein instead of SHA-1, and it was measurably faster, in part because >> Skein produces more output per

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: > While talking about performance, I did a quick prototype of random using > Skein instead of SHA-1, and it was measurably faster, in part because > Skein produces more output per hash. Which Skein parameters did you use, and how

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 09:38 AM, Theodore Ts'o wrote: > On Mon, Jun 09, 2014 at 09:17:38AM -0400, George Spelvin wrote: >> Here's an example of a smaller, faster, and better fast_mix() function. >> The mix is invertible (thus preserving entropy), but causes each input >> bit or pair of bits to avalanche to

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Mon, Jun 09, 2014 at 09:17:38AM -0400, George Spelvin wrote: > Here's an example of a smaller, faster, and better fast_mix() function. > The mix is invertible (thus preserving entropy), but causes each input > bit or pair of bits to avalanche to at least 43 bits after 2 rounds and > 120 bit0

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Mon, Jun 09, 2014 at 09:17:38AM -0400, George Spelvin wrote: Here's an example of a smaller, faster, and better fast_mix() function. The mix is invertible (thus preserving entropy), but causes each input bit or pair of bits to avalanche to at least 43 bits after 2 rounds and 120 bit0 after

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 09:38 AM, Theodore Ts'o wrote: On Mon, Jun 09, 2014 at 09:17:38AM -0400, George Spelvin wrote: Here's an example of a smaller, faster, and better fast_mix() function. The mix is invertible (thus preserving entropy), but causes each input bit or pair of bits to avalanche to at

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: While talking about performance, I did a quick prototype of random using Skein instead of SHA-1, and it was measurably faster, in part because Skein produces more output per hash. Which Skein parameters did you use, and how much

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 12:25 PM, Theodore Ts'o wrote: On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: While talking about performance, I did a quick prototype of random using Skein instead of SHA-1, and it was measurably faster, in part because Skein produces more output per hash.

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
... but how did you measure the 2/3 the time? I've done some measurements, using both time calling fast_mix() and fast_mix2() N times and divide by N (where N needs to be quite large). Using that metric, fast_mix2() takes seven times as long to run. Wow, *massively* different results.

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 01:41 PM, H. Peter Anvin wrote: On 06/11/2014 12:25 PM, Theodore Ts'o wrote: On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: While talking about performance, I did a quick prototype of random using Skein instead of SHA-1, and it was measurably faster, in part

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
Sadly I can't find the tree, but I'm 94% sure it was Skein-256 (specifically the SHA3-256 candidate parameter set.) It would be nice to have two hash functions, optimized separately for 32- and 64-bit processors. As the Skein report says, the algorithm can be adapted to 32 bits easily enough.

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o
On Wed, Jun 11, 2014 at 08:32:49PM -0400, George Spelvin wrote: Comparable, but slightly slower. Clearly, I need to do better. And you can see the first-iteration effects clearly. Still, noting *remotely* like 7x! I redid my numbers, and I can no longer reproduce the 7x slowdown. I do see

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin
Just to add to my total confusion about the totally disparate performance numbers we're seeing, I did some benchmarks on other machines. The speedup isn't as good one-pass as it is iterated, and as I mentioned it's slower on a P4, but it's not 7 times slower by any stretch. There are all

drivers/char/random.c: More futzing about

2014-06-09 Thread George Spelvin
Just as an example of some more ambitious changes I'm playing with... I really think the polynomial + twist has outlived its usefulness. In particular, table lookups in infrequently accessed code are called D-cache misses and are undesirable. And the input_rotate is an ugly kludge to compensate

drivers/char/random.c: More futzing about

2014-06-09 Thread George Spelvin
Just as an example of some more ambitious changes I'm playing with... I really think the polynomial + twist has outlived its usefulness. In particular, table lookups in infrequently accessed code are called D-cache misses and are undesirable. And the input_rotate is an ugly kludge to compensate