Here's some interesting timing results for the Moller code. Note that
after roughly 16,000 decimal digits the bgcd algorithm becomes the
fastest.
approx 0256 digits
gcd 57488.00 cycles
rgcd57160.00 cycles
bgcd
I've been running speed some tests with the Moller patches, and it
looks to me like they work just fine for larger numbers, but at
smaller limb counts they are slower than the original code. I've
attached my test code so that you can see what I'm doing.
I suspect that I need to add some tuning to