Comment #37 on issue 3809 by dana.jac...@gmail.com: isprime can be faster
http://code.google.com/p/sympy/issues/detail?id=3809

I've been working on primality tests and proofs for a Perl module, so this is something I've been looking at a lot recently. Here are my long-winded thoughts, with the caveat that I'm not a sympy developer.

Comment 1 & 2 mention AKS. AKS is theoretically important, but is still basically useless in any practical sense. It is insanely slow, even with Lenstra and Bernstein's improvements to the 'r' selection. Brent 2010 estimates 840 years for a 308 digit number (on a 1GHz machine, but even with 100x faster hardware that's ridiculous). For the deterministic variants I think you'll find that time isn't out of line.

Comment 3 mentions the "pypy ran a deterministic test in 18 minutes" comment. He ran a single Fermat test, which is much less accurate than a single M-R test. Try 561 through his function. or 1105, 1729, 2465, etc. So after 18 minutes we've not accomplished very much. This is definitely not the direction to go.


For small numbers (<2^64), you can use deterministic M-R tests, such as the ones from https://miller-rabin.appspot.com/. That would be faster than the existing code and remove the table of exceptions. BPSW would probably be faster for the upper range. BPSW (standard, strong, extra strong, or the oddball variants some packages implement) has no counterexamples < 2^64, and this is typically the same speed or faster than running 4 M-R tests. With my native (C without GMP) implementations I cut over to BPSW if the number is over 32 bit in size -- your mileage may vary (this is purely a benchmark-oriented performance decision at this size).

For large values, I think you really should use a variant of BPSW (see Nicely's page or http://en.wikipedia.org/w/index.php?title=Baillie%E2%80%93PSW_primality_test). Here is a number from Arnault 1993 and another from Arnault 1995 that fool sympy's current implementation:

8038374574536394912570796143419421081388376882875581458374889175222974273765333652186502336163960045457915042023603208766569966760987284043965408232928738791850869166857328267761771029389697739470167082304286871099974399765441448453411558724506334092790222752962294149842306881685404326457534018329786111298960644845216191652872597534901

2887148238050771212671429597130393991977609459279722700926516024197432303799152733116328983144639225941977803110929349655578418949441740933805615113979999421542416933972905423711002751042080134966731755152859226962916775325475044445856101949404200039904432116776619949629539250452698719329070373564032273701278453899126120309244841494728976885406024976768122077071687938121709811322297802059565867

Running lots of M-R tests to many fixed bases like sympy's current implementation or libtommath does (and hence Perl 6) is problematic. Arnault 1995 gives an example (the second above) that will pass MR tests with all bases up to 306. That paper gives instructions on how to construct such numbers. Using random bases mostly solves this problem. However, we're still left with it being terribly inefficient -- running 46 M-R tests is 10-15 times slower than a BPSW test. If you're feeling paranoid about letting a pseudoprime through, you can add a couple more random-base M-R tests (only Mathematica and Math::Prime::Util seem to go to this length that I'm aware of).

Why BPSW? Nicely's page should cover lots of the reasons. Almost all math packages switched to using it, mostly in the 1990s. Maple, Mathematica (with an added base 3 M-R test), Pari, SAGE, Maxima, FLINT, MPIR, PRIMO are some examples that use a BPSW variant for their probable prime test. In 33 years nobody has published a concrete counterexample (though we know they exist). For some reason the tests act anti-correlated.

Python's gmpy2 includes David Cleaver's C+GMP code for primality functions including Lucas tests and a strong BPSW test, but is_prime still uses mpz_probab_prime_p. I spoke to some of the Perl 6 people last month, and I need to write up some code for NQP as well as start a discussion on the API.

Here are examples and times on my computer. The Perl tests use Math::Prime::Util + Math::Prime::Util::GMP. This is C+GMP under the hood, just like the gmpy and gmpy2 modules. Python 2.7. Based on the time required for one run of mr with the latest git version (that uses pow(base, t, n)), it looks like this could be done in a reasonable length of time.

# sympy from git, mr(int("1808010808"*1560 + "1"),[2])
12.2s

# sympy from git, isprime(int("1808010808"*1560 + "1"))
9 minutes 16.4s

Hmm, this is a few minutes. Of course I have the latest code so it's presumably much more efficient than earlier, but I'm concerned about the comment "[...]there is no way (with MR) that the prime given in the OP will return in a few minutes." Let's look at some other systems.


# Pari/gp 2.6.0 ispseudoprime(....) uses a BPSW variant
60.12s

# gmpy 1.15's is_prime (mpz_probab_prime_p(n,25), or M-R with 25 fixed-random bases)
# Installed via Fedora
8 minutes 29.3s

# gmpy 2.0's is_prime (mpz_probab_prime_p(n,25), or M-R with 25 fixed-random bases)
# Built from sources along with latest MPFR, MPC, and GMP.
5 minutes 12.8s

# gmpy 2.0 is_strong_prp(int("1808010808"*1560 + "1"),2)
12.03s

# gmpy 2.0 is_selfridge_prp(int("1808010808"*1560 + "1"))
64.95s

# gmpy 2.0 is_strong_selfridge_prp(int("1808010808"*1560 + "1"))
64.85s

I wasn't able to get gmpy2's is_strong_bpsw_prp function to work, but it would be about 77s based on the two tests it has to do.


My Perl module, with GMP implementations that are a little faster than the ones in gmpy2:

# Just the SPRP-2 test:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_strong_pseudoprime($n,2);'
12.06s

# Strong Lucas-Selfridge test:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_strong_lucas_pseudoprime($n);'
47.17s

# Extra Strong Lucas test -- typically a bit faster:
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_extra_strong_lucas_pseudoprime($n);'
31.84s

# is_prob_prime includes some quick tests for weeding out composites, then SPRP-2 and strong Lucas-Selfridge: time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_prob_prime($n);'
59.32s

# is_prime is is_prob_prime plus 2-5 extra M-R tests (2 for this size):
time perl -Mbigint -MMath::Prime::Util=:all -E 'my $n = 0+("1808010808" x 1560 . "1"); say is_prime($n);'
82.97s

So it's possible to do a BPSW variant (SPRP-2 + Grantham Extra Strong test) and have it done in under 44 seconds, assuming modular math including exponentiation is fast. Add ~12s per extra M-R test if you want.

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--
You received this message because you are subscribed to the Google Groups 
"sympy-issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy-issues+unsubscr...@googlegroups.com.
To post to this group, send email to sympy-issues@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy-issues.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to