On Sat, 14 May 2022, Luke Small wrote:

> Look at my code. I don’t even use a modulus operator. I perform hit and
> miss with a random bitstream.
> 
> How can I have a bias of something I don’t do? I return a bitstream which
> meets the parameters of being a value less than the upper bound. Much like
> arc4random_buf().
> 
> If I use arc4random_uniform() repeatedly to create a random distribution of
> say numbers less than 0x1000 or even something weird like 0x1300 will the
> random distribution be better with arc4random_uniform() or with mine? For
> 0x1000 mine will simply pluck 12 bits of random data straight from the
> arc4random() (and preserve the remaining 20 bits for later) on the first
> try, just like it’s arc4random_buf().
> 
> arc4random_uniform() will perform a modulus of a 32 bit number which adds
> data to the bitstream. Does it make it better? Perhaps it makes it harder
> to guess the source bits.
> 
> I don’t know; and I’m not going to pretend to be a cryptologist. But I’m
> looking at modulo bias.
> 
> I didn’t know what it was, before, but I basically “rejection sample”:
> 
> https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/

No, you aren't:

>         for (;;) {
>                 if (rand_bits < bits) {
>                         rand_holder |= ((uint64_t)arc4random()) <<
> rand_bits;
>                         
>                         /* 
>                          * rand_bits will be a number between 0 and 31 here
>                          * so the 0x20 bit will be empty
>                          * rand_bits += 32;
>                          */ 
>                         rand_bits |= 32;
>                 }
>                 
>                 ret = rand_holder & uuu;
>                 rand_holder >>= bits;
>                 rand_bits -= bits;
>                 
>                 if (ret < upper_bound)
>                         return ret;
>         }

This isn't rejection sampling. This is reusing part of the rejected
sample.

Think of it like this: you want to uniformly generate a number in the
range [2:10] by rolling 2x 6-sided dice. What do you do when you roll
11 or 12? You can't just reroll one of the dice because the other dice
is constrained to be have rolled either 5 or 6, and so proceeding with
it would force the output to be in the range [6:11] for these ~5.6%
of initial rolls. Your output is no longer uniform.

BTW the existing code already implements the prefered approach of the
article you quoted.

-d

Reply via email to