> At the end of this loop, key[b] contains two copies of the cyclically
> permuted skey next to each other. When building the cache, you scan
> through the bits of val, xor the corresponding keys in if they're set
> and then throw away half of the 32 bits when assigning
> scache->bytes[val] = res;
>
> So I think you can use "uint16_t keys[NBBY];" and "uint16_t res = 0;",
> replace j < 32 by j < 16 and 31 - j by 15 - j and you'll get the exact
> same result.
In other words, the first nested loop can be simplified to this:
for (b = 0; b < NBBY; ++b)
key[b] = skey << b | skey >> (NBSK - b);
and instead of populating the the key[] array up front, you could do:
void
stoeplitz_cache_init(struct stoeplitz_cache *scache, stoeplitz_key skey)
{
unsigned int b, shift, val;
/*
* Cache the results of all possible bit combinations of
* one byte.
*/
for (val = 0; val < 256; ++val) {
uint16_t res = 0;
for (b = 0; b < NBBY; ++b) {
shift = NBBY - b - 1;
if (val & (1 << shift))
res ^= skey << b | skey >> (NBSK - b);
}
scache->bytes[val] = res;
}
}