Status: New
Owner: ----
Labels: Type-Defect Priority-Low

New issue 3927 by dana.jac...@gmail.com: randprime is not uniform
http://code.google.com/p/sympy/issues/detail?id=3927

randprime uses the PRIMEINC method for generating random primes, which output primes with a probability related to their primegap size. Simple example:

{
from sympy import randprime
from collections import Counter

# Note that 37, 53, and 59 are selected 3x more often than 43
Counter([randprime(2**5,2**6-1) for i in xrange(1000000)])
Counter({37: 183002, 53: 182149, 59: 181035, 61: 151219, 47: 121250, 41: 120884, 43: 60461})

# this is a more obnoxious example because I've looked at a prime gap table. See how 31469 gets selected over 30x more often than some of the twins.
Counter([randprime(31300,31907) for i in xrange(1000000)])
Counter({31469: 118072, 31847: 49291, 31891: 40960, ..., 31729: 3248, 31723: 3244, 31321: 3182})
}

Fortunately the code has guards to keep out of range primes from being output, so it isn't all bad. See the paper Pierre-Alain Fouque and Mehdi Tibouchi, "Close to Uniform Prime Number Generation With Fewer Random Bits", 2011. [http://eprint.iacr.org/2011/481] for a discussion of distributions. Their method looks nice, but is specific to generating a power of 2 range.

An alternate implementation with uniform distribution but slower would be to (1) bring the ranges in to low = next_prime(low-1), high = prev_prime(high+1). Check that low <= high (if not, there is no prime in the range). Now repeat selecting a random number in the interval and return it if it is prime. For example right after the map:

{
    a = nextprime(a-1)
    b = prevprime(b+1)
    if a > b:
        raise ValueError("no primes exist in the specified range")
    while True:
       n = random.randint(a, b)
       if isprime(n):
           return n
}

This gives a uniform distribution (see either of the examples above), but it is 2-4x slower (having PR 3809 helps as it speeds up isprime, nextprime, and prevprime). I'd like to see the above perhaps have a loop limit just in case there's something screwy with randint.

For small inputs, another way that works well and can be faster, is to select the nth prime in the range primecount(2,low), primecount(low,high). This relies on having really fast ranged primecount and nth_prime for small values.

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--
You received this message because you are subscribed to the Google Groups 
"sympy-issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy-issues+unsubscr...@googlegroups.com.
To post to this group, send email to sympy-issues@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy-issues.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to