[Bug libgcc/108279] Improved speed for float128 routines

2023-02-10 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #24 from Michael_S --- (In reply to Michael_S from comment #22) > (In reply to Michael_S from comment #8) > > (In reply to Thomas Koenig from comment #6) > > > And there will have to be a decision about 32-bit targets. > > > > > >

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #23 from Michael_S --- (In reply to Jakub Jelinek from comment #19) > So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86 > sfp-machine.h > #define FP_INIT_ROUNDMODE \ > do {

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #22 from Michael_S --- (In reply to Michael_S from comment #8) > (In reply to Thomas Koenig from comment #6) > > And there will have to be a decision about 32-bit targets. > > > > IMHO, 32-bit targets should be left in their

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #21 from Wilco --- (In reply to Jakub Jelinek from comment #20) > __attribute__((noinline, optimize ("rounding-math"))) static int > round_to_nearest (void) { return 1.0f - __FLT_MIN__ == 1.0f + __FLT_MIN__; } Wouldn't that always

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #20 from Jakub Jelinek --- __attribute__((noinline, optimize ("rounding-math"))) static int round_to_nearest (void) { return 1.0f - __FLT_MIN__ == 1.0f + __FLT_MIN__; } and if (round_to_nearest ()) \ _fcw = FP_RND_NEAREST; \

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #19 from Jakub Jelinek --- So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86 sfp-machine.h #define FP_INIT_ROUNDMODE \ do {

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-18 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #18 from

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-16 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #17 from joseph at codesourcery dot com --- It's not part of the ABI for the Arm 32-bit Architecture (AAPCS32). https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst You can file an issue there if you want, though I

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-15 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #16 from Michael_S --- (In reply to Jakub Jelinek from comment #15) > libquadmath is not needed nor useful on aarch64-linux, because long double > type there is already IEEE 754 quad. That's good to know. Thank you. If you are

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #15 from Jakub Jelinek --- libquadmath is not needed nor useful on aarch64-linux, because long double type there is already IEEE 754 quad.

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-15 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #14 from Thomas Koenig --- Seems that libquadmath is not built on that particular Linux/CPU variant, for whatever reason. At last I cannot find any '*quadmath* files in the build directory. /proc/cpuinfo tells me that processor

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-15 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #13 from Thomas Koenig --- I tried compiling your tests on Apple silicon using Asahi Linux, but without success. A first step was rather easy; replacing __float128 by _Float128 was required. I then bootstrapped gcc on that machine

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #12 from Michael_S --- (In reply to Thomas Koenig from comment #10) > What we would need for incorporation into gcc is to have several > functions, which would then called depending on which floating point > options are in force at

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #11 from Michael_S --- (In reply to Thomas Koenig from comment #9) > Created attachment 54273 [details] > matmul_r16.i > > Here is matmul_r16.i from a relatively recent trunk. Thank you. Unfortunately, I was not able to link it

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #10 from Thomas Koenig --- What we would need for incorporation into gcc is to have several functions, which would then called depending on which floating point options are in force at the time of invocation. So, let's go through

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #9 from Thomas Koenig --- Created attachment 54273 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54273=edit matmul_r16.i Here is matmul_r16.i from a relatively recent trunk.

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-12 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #8 from Michael_S --- (In reply to Thomas Koenig from comment #6) > (In reply to Michael_S from comment #5) > > Hi Thomas > > Are you in or out? > > Depends a bit on what exactly you want to do, and if there is > a chance that what

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-12 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #7 from Michael_S --- Either here or my yahoo e-mail

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-12 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #6 from Thomas Koenig --- (In reply to Michael_S from comment #5) > Hi Thomas > Are you in or out? Depends a bit on what exactly you want to do, and if there is a chance that what you want to do will be incorporated into gcc. If

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-11 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #5 from Michael_S --- Hi Thomas Are you in or out? If you are still in, I can use your help on several issues. 1. Torture. See if Invalid Operand exception raised properly now. Also if there are still remaining problems with NaN.

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-04 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #4 from Michael_S --- (In reply to Jakub Jelinek from comment #2) > From what I can see, they are certainly not portable. > E.g. the relying on __int128 rules out various arches (basically all 32-bit > arches, > ia32, powerpc 32-bit

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-04 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #3 from Thomas Koenig --- (In reply to Jakub Jelinek from comment #2) > From what I can see, they are certainly not portable. > E.g. the relying on __int128 rules out various arches (basically all 32-bit > arches, > ia32, powerpc

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #2 from Jakub Jelinek --- >From what I can see, they are certainly not portable. E.g. the relying on __int128 rules out various arches (basically all 32-bit arches, ia32, powerpc 32-bit among others). Not handling exceptions is a

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-03 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #1 from Thomas Koenig --- Created attachment 54183 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54183=edit Example patch with Michael S's code just pasted over the libgcc implementation, for a test A benchmarks: Just