Re: bignums, gmp, bytestring, .. ?

Peter Tanski Sat, 18 Nov 2006 13:46:42 -0800

On Nov 17, 2006, at 7:24 PM, Claus Reinke wrote:

it seems that haskell versions of bignums is pretty much gone frommore recent discussions of gmp replacements. now, I assume thatthere are lots of optimizations that keep gmp popular that onewouldn't want to have to reproduce, so that a haskell variant mightnot be competitive even if one had an efficient representation, but
- do all those who want to distribute binaries, but not dynamically
   linked, need bignums?

You are right: most don't. Even when working with larger numbers, Ihave very rarely used bignum libraries myself, mostly because thereis usually a clever--and often faster--way to deal with largenumbers, especially when you don't require all that extra precision.These methods were better known and relatively widely used beforemulti-precision libraries became so widespread and have even becomemore useful since 64-bit machines and C99's 64-bit ints came around.Integers are mostly a convenience. Large numbers are necessary ifyou need very high precision mathematical calculations or if you aredoing cryptography; for that matter, high precision mathematicsusually benefits more from arbitrary precision decimal (fixed orfloating point) for certain calculations.

The simple problem with Haskell and Integer is that, according to theStandard, Integer is a primitive: it is consequently implemented aspart of the runtime system (RTS), not the Prelude or any library(though the interface to Integer is in the base library). For GHC,compiling with -fno-implicit-prelude and explicitly importing onlythose functions and types you need the won't get rid of Integer.Possible solutions would be to implement the Integer 'primitive' as aseparate library and import it into the Prelude or base libraries,then perform an optimisation step where base functions are onlylinked in when needed. Except for the optimisation step, thisactually makes the job easier since Integer functions would be calledusing the FFI and held in ForeignPtrs. (I have already done the FFI-thing for other libraries and a primitive version of the replacement.)

- it would be nice to know just how far off a good haskell version
   would be performance-wise..

There is actually a relatively recent (2005, revised) Haskell versionof an old Miranda library for "infinite" precision floating pointnumbers by Martin Guy, called BigFloat, at http://bignum.sourceforge.net/. Of course, it is floating point andIntegers would be faster but the general speed difference between thetwo would probably be proportional to the speed difference in C andso would be just as disappointing. The BigFloat library (using theHaskell version) came in last place at the Many Digits "Friendly"Competition for 2005 (see http://www.cs.ru.nl/~milad/manydigits/final_timings_rankings.html), though you would probably be moreinterested in looking at the actual timing results to get a betteridea. (The fastest competitors were MPFR, which uses GMP, and TheWolfram Team, makers of Mathematica; BigFloat actually beat iRRAM andMaple solutions for several problems.)

The real problem with an Integer library written in *pure* Haskell--especially with Integers--is simple: Haskell is too high-level and nocurrent Haskell compiler, even JHC, has even remotely decent supportfor low-level optimisations such as being able to unroll a loop overtwo arrays of uint32_t and immediately carry the result from addingthe first elements from each array to the addition of the next two,in two machine instructions. I shouldn't have to mentionparallelization of operations. In short, if you look at generalassembler produced from any Haskell compiler, it is *very* ugly andArrays are even uglier. (For a simple comparison to Integerproblems, try implementing a fast bucket sort in Haskell.)

GMP uses hand-written assembler routines for many supportedarchitectures, partly because GMP was originally created for earlierversions of GCC which could not optimise as well as currentversions. Even GMP cannot compare to an optimised library using SIMD(Altivec, SSE)--in my tests, SIMD-optimised algorithms are between 2xto 10x faster. SIMD and small assembler routines (especially forarchitectures without SIMD, especially) are what I have been doingthe bulk of my work on. I doubt I have the ability to extend thecurrent state of the art with regard to higher-level polynomialoptimisations, so I am always trying out any algorithm I can find.(For very high precision multiplication (more than 30,000 bits), notmuch beats a SIMD-enabled Fast Fourier Transform; a specially codedToom-3 algorithm would be faster but for very large operands thealgorithm becomes prohibitively complex. Division is another labour-intensive area.)

- what would be a killer for numerical programming, might still be
   quite acceptable for a substantial part of haskell uses?

Quite possibly, yes. I haven't done my own study on what most usersactually require but from notes in the source code of the Pythonmulti-precision integer library (LongObject), they found most usersof their library never needed more than 2 or 3 uint32_t in length.(They had originally reserved 5 uint32_t of space for a newly createdPyLongObject.) Of course even ordinary users would notice a changein speed if they used a much slower library on larger numbers (79decimal digits, 256 bits, or more) in an interactive session.

of course, the real gmp replacement project might be going so well
that a haskell version would be obsolete rather sooner than later, and
i certainly don't want to interfere with that effort.

Not at all. All ideas gratefully received :) It's going slowly,partly because I am constantly having to learn more, test andrefactor so the result isn't a naive solution. I could reimplementGMP, of course, but many of the GMP algorithms are not readilyamenable to SIMD operations--I am trying to *beat* GMP in speed.

all that being said, it occurred to me that the representations and
fusions described in the nice "rewriting haskell strings" paperwould be a good foundation for a haskell bignum project, wouldn'tthey?

They would. Thanks for the link--I had been working from the sourcecode. A Haskell-C combined solution may sometimes be faster--Inoticed this before now while experimenting with using a randomnumber generator library written in C, and came to the conclusionthat the speed was largely due to Haskell's ability to cache theresults of many operations, sometimes better than I could predict onmy own. Term rewriting for a hybrid Haskell-C library could be veryuseful for complex polynomials. It might use equationaltransformations or Template Haskell, GHC's own Ian Lynagh did somework on this http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/Fraskell/ (on Fraskell, with some code examples); and see esp.,http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/papers/Template_Haskell-A_Report_From_The_Field.ps (PostScript file)). Theone problem I have with using equational transformations over arraysis that it seems very difficult--I would say, impossible withoutmodifying the compiler--to perform a transformation that worksbetween two combined arrays when the the array-operands may be ofdifferent sizes.

This is similar to the solution FFTW (see http://www.fftw.org) uses,only they used OCaml to perform the analysis on particular equationsand patch the higher-level combinations together again in C--FFTW isessentially an equational compiler. The problem with many of FFTW'soptimisations are that much of their analysis relies on constantvalues--you are writing a program to solve one defined mathematicalproblem where you know something from the beginning, such as the sizeof the operands (that's a biggie!). The SPIRAL project uses similarmethods for more basic operations, such as multiplierless constantmultiplication (transform multiplication of an unknown value with aconstant into an optimal number of left-shifts and additions). See,e.g., http://www.spiral.net/hardware/multless.html. When writing anarbitrary precision Integer library for Haskell it must be moregeneral than that: I have to assume no constants may be available.It is essentially an optimisation problem--but this whole endeavouris, in a way.


Cheers,
Pete


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: bignums, gmp, bytestring, .. ?

Reply via email to