Matthew, thank you very much. It seems that your changes with precision switching on every function call is not required on 32-bit windows. I have prepared a little pull request that fixes it. All tests on my VirtualBox machines (both 32-bit and 64-bit) pass. Our own tests related to ffi pass too.
https://github.com/plt/racket/pull/280 Small note about extflonum ffi on win platforms: It is possible to use long double on win platforms with gcc (mingw, mingw-w64) compiler. It is also possible to use compiled DLL with Racket FFI. The only one note is that long double data must be 16-byte aligned, therefore you should use gcc command line option -m128bit-long-double on win32 platform. On win64 platform aligning is 16 byte by default. Please let me know if I can help further with testing or documentation. 2013/3/18 Matthew Flatt <mfl...@cs.utah.edu>: > At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote: >> The following pull request provides long double type (extflonum) on >> win32: https://github.com/plt/racket/pull/265 > > Merged --- with some changes, as usual... > >> It seems that RacketCGC is supposed to be built without any >> third-party DLLs (longdouble.dll being one of them), so the >> following building process seems natural: [...] > > I changed the way that "longdouble.dll" is loaded and linked so that > `extflonum-available?' returns #f if "longdouble.dll" isn't found. > Since extflonums are not needed to build Racket 3m, that solves the > build-order problem. > >> 1. There is currently a problem with the code generation in >> foreign.rktc. [...] >> Probably there is a need for an option in foreign.rktc declaratons >> to turn off casting in this partuicular case. Matthew, it is possible >> to add such an option? > > Yes, done. > >> 2. xform.rkt contains a long list of all long double arithmetic >> [non-]functions. > > Instead of changing "xform.rkt", I added XFORM_NONGCING annotations > to the function prototypes. > >> 3. In numstr.c I could not avoid separating the parsing of >> double and long double values. > > That looks ok. > > > Another problem is that Windows seems unhappy with changing the > precision mode. The _control87() function apparently ignores an attempt > to change the mode, and when I set the mode using a FLDCW instruction, > some library function resets it back. > > After a brief and unsuccessful attempt to track down where the mode is > reset, I changed the DLL and JIT to set the mode just before performing > extflonum arithmetic, and then set it back afterward. > > Of course, there can be a cost to changing the mode at such a fine > granularity. When I run the first program below in Mac OS X 64-bit > mode, I get > > 'flonum > cpu time: 483 real time: 483 gc time: 0 > 1.0 > cpu time: 474 real time: 474 gc time: 0 > 1.0 > cpu time: 789 real time: 787 gc time: 0 > 1.0 > 'extflonum > cpu time: 641 real time: 640 gc time: 0 > 1.0t0 > cpu time: 885 real time: 884 gc time: 0 > 1.0t0 > cpu time: 959 real time: 958 gc time: 0 > 1.0t0 > > but if I force the JIT to set the control word on every extflonum > operation, I get > > .... > 'extflonum > cpu time: 806 real time: 806 gc time: 0 > 1.0t0 > cpu time: 1054 real time: 1053 gc time: 0 > 1.0t0 > cpu time: 959 real time: 957 gc time: 0 > 1.0t0 > > It looks like division is slow enough to mask the overhead of setting > the mode. For addition and subtraction, the overhead seems to be on the > order of the cost of switching flonums to extflonums. > > The JIT could be improved to avoid switching between consecutive > operations, but does the cost of this approach look reasonable as a > start? > > ---------------------------------------- > > #lang racket > (require racket/flonum > racket/extflonum) > > 'flonum > (time > (for/fold ([v 1.0]) ([i (in-range 100000000)]) > (fl- (fl+ v v) v))) > (time > (for/fold ([v 1.0]) ([i (in-range 100000000)]) > (fl- (fl+ v v) 1.0))) > (time > (for/fold ([v 1.0]) ([i (in-range 100000000)]) > (fl/ (fl* v v) v))) > > 'extflonum > (time > (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) > (extfl- (extfl+ v v) v))) > (time > (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) > (extfl- (extfl+ v v) 1.0t0))) > (time > (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) > (extfl/ (extfl* v v) v))) > > -- With best regards, Michael Filonenko _________________________ Racket Developers list: http://lists.racket-lang.org/dev