Fair enough!
I replied as I found a factor of 60 a bit surprising. Though I shouldn't have
been -- my n digits of pi scheme program was consistently about 16-17 times
slower than gmp based one (at least up to 1E9), using the same algorithm
(Chudnovsky's), which is also highly dependent on
This is getting interesting but maybe only for me, so I'll stop here.
I did some profiling and the majority of the time is spent in
math/big.addMulVVW. Thinking I might be doing something stupid, I compared
it with the Go implementation at
Oh, I did say my implementation was straightforward. It's free of any
clever multiplication algorithms or mathematical delights. It could easily
be giving up 10x or more for that reason alone. And I haven't even profiled
it yet.
-rob
On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah wrote:
> FYI
FYI Julia (on M1 MBP) seems much faster:
julia> @which factorial(big(1))
factorial(x::BigInt) in Base.GMP at gmp.jl:645
julia> @time begin; factorial(big(1)); 1; end
27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)
Probably they use Schönhage-Strassen
Thanks for the tip. A fairly straightforward implementation of this
algorithm gives me about a factor of two speedup for pretty much any value.
I went up to 1e8!, which took about half an hour compared to nearly an hour
for MulRange.
I'll probably stick in ivy after a little more tuning. I may
I wrote the single allocation version and added it to the issue, but Bakul
is absolutely right that the recursive solution has merit when the input is
large enough. By building up large values on each "side" of the recursion,
Karatsuba gets used for the larger multiplies and the recursive version
For that you may wish to explore Peter Luschny's "prime swing" factorial
algorithm and variations!
https://oeis.org/A000142/a000142.pdf
And implementations in various languages including go:
https://github.com/PeterLuschny/Fast-Factorial-Functions
> On Jan 8, 2024, at 9:22 PM, Rob Pike wrote:
Here's an example where it's the bottleneck: ivy factorial
!1e7
1.20242340052e+65657059
)cpu
1m10s (1m10s user, 167.330ms sys)
-rob
On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah wrote:
> Perhaps you were thinking of this?
>
> At iteration number k, the value xk contains O(klog(k)) digits,
Perhaps you were thinking of this?
At iteration number k, the value xk contains O(klog(k)) digits, thus the
computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost with
this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
A better approach is the binary splitting : it just
The overflow fix is pending: CL 554617
I filed https://github.com/golang/go/issues/65027 for a possibly faster
mulRange implementation.
- gri
On Mon, Jan 8, 2024 at 11:47 AM Robert Griesemer wrote:
> Hello John;
>
> Thanks for your interest in this code.
>
> In a (long past) implementation of
Hello John;
Thanks for your interest in this code.
In a (long past) implementation of the factorial function, I noticed that
computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed
in a recursive fashion than when computed iteratively: the reason (I
believed) was that the
Actually, both implementations have bugs!
The recursive implementation ends with:
```
m := (a + b) / 2
return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
```
That's a bug whenever `(a+b)` overflows, making `m` small.
FIX: `m := a + (b-a)/2`
My iterative implementation went into an
I'm equally curious.
FWIW, I realized the loop should perhaps be
```
mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even on
32-bit arch
for m := a + 1; m <= b; m++ {
mb.setUint64(m)
z = z.mul(z, mb)
}
```
to avoid allocating repeatedly for `m`, which yields:
It seems reasonable but first I'd like to understand why the recursive
method is used. I can't deduce why, but the CL that adds it, by gri, does
Karatsuba multiplication, which implies something deep is going on. I'll
add him to the conversation.
-rob
On Sun, Jan 7, 2024 at 5:46 PM John
I enjoy bignum implementations, so I was looking through nat.go and saw
that `mulRange` is implemented in a surprising, recursive way,. In the
non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) *
mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided).
That's fine, but I didn't
15 matches
Mail list logo