Will GCC eventually learn to use BSR or even TZCNT on AMD/Intel processors?

2023-06-05 Thread Stefan Kanthak
instead of code fiddling with the stack! Stefan Kanthak

Who cares about size? (was: Who cares about performance (or Intel's CPU errata)?)

2023-05-29 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak > wrote: >> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argument) =

Re: Who cares about performance (or Intel's CPU errata)?

2023-05-28 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak > wrote: [...] >> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argu

Re: Who cares about performance (or Intel's CPU errata)?

2023-05-27 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 2:25 PM Stefan Kanthak > wrote: >> >> Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers >> are: >> >> --- dontcare.c --- >> int ispowerof2(unsigned __int128 arg

Re: Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 2:38 PM Stefan Kanthak > wrote: >> >> "Jakub Jelinek" wrote, completely clueless: >> >>> On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote: >>>> OUCH: popcnt writes

Re: Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
"Jakub Jelinek" wrote, completely clueless: > On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote: >> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY >> no need to clear it beforehand nor to clear the higher 24 bits >>

Who cares about performance (or Intel's CPU errata)?

2023-05-27 Thread Stefan Kanthak
Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers are: --- dontcare.c --- int ispowerof2(unsigned __int128 argument) { return __builtin_popcountll(argument) + __builtin_popcountll(argument >> 64) == 1; } --- EOF --- GCC 13.3gcc -march=haswell -O3

Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
--- .c --- int ispowerof2(unsigned long long argument) { return __builtin_popcountll(argument) == 1; } --- EOF --- GCC 13.3gcc -m32 -march=alderlake -O3 gcc -m32 -march=sapphirerapids -O3 gcc -m32 -mpopcnt -mtune=sapphirerapids -O3

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-27 Thread Stefan Kanthak
You wrote: >在 2023-05-26 23:40, Stefan Kanthak 写道: >> Feel free to propose this alternative here (better elsewhere, where you'll >> earn less laughter). >> But don't forget that this 23-bit mantissa will be all zeroes for quite some >> 64-bit (and even 32-bit) integ

Epic code generator/optimiser failures

2023-05-27 Thread Stefan Kanthak
--- demo.c --- int ispowerof2(unsigned long long argument) { return (argument != 0) && ((argument & argument - 1) == 0); } --- EOF --- GCC 13.1gcc -m32 -mavx -O3 # or -march=native instead of -mavx https://gcc.godbolt.org/z/T31Gzo85W ispowerof2(unsigned long long): vmovq xmm1,

Re: GCC plays "Shell Game", but looses track of the shell covering the nought

2023-05-27 Thread Stefan Kanthak
ions as 12.* Also note the difference to yesterdays demo.c: "thanks" to the added | (argument != 0) GCC does NOT generate SSE2 instructions any more. I don't know yet whether this change is a quirk or WTF, Stefan > Dave > > > On Sat, 27 May 2023 18:23:12 +0200 > "Ste

GCC plays "Shell Game", but looses track of the shell covering the nought

2023-05-27 Thread Stefan Kanthak
--- demo.c --- int ispowerof2(unsigned long long argument) { return (argument != 0) && ((argument & argument - 1) == 0); } --- EOF --- GCC 12.2gcc -m32 -O3 https://gcc.godbolt.org/z/YWP4zb8jd ispowerof2(unsigned long long): pushedi# three registers

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 15:34, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 14:55, Stefan Kanthak >> > wrote: >> >> [...] >> >> >> NOT obv

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
You wrote: >在 2023-05-26 14:46, Stefan Kanthak 写道: >> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set" >> (... ...) >> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the >>right side? > > Pleas

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 15:48, Stefan Kanthak wrote: >> >> "Jakub Jelinek" wrote: >> >> [...] >> >> > And for -m32 it is also the last option that wins, but as with >> > many other cases

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: [...] > And for -m32 it is also the last option that wins, but as with > many other cases just last one from certain set of options. [...] > The -mISA options are processed left to right after as well as BEFORE > setting base from -march=. In other words: although

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 14:55, Stefan Kanthak wrote: [...] >> NOT obvious is but that -m -march= does not clear any >> not supported in , i.e the last one does NOT win here. > > The last -march option selects the base set of instructi

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 02:19:54PM +0200, Stefan Kanthak wrote: >> > I find it very SURPRISING that you're only just learning the basics of >> > how to use gcc NOW, after YELLING about all the OUCH. >> >> I'm NOT surprised

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 13:23, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote: >> >> Why does the documentation FAIL to specify that

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 13:09, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 12:29, Stefan Kanthak >> > wrote: >> >> OUCH: as shown in https://godbolt.org/z

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote: >> Why does the documentation FAIL to specify that CPU features given by >> -m* override -m32 or enables them in ADDITION to those enabled by -march=? > > Because it's obvious. If you

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 12:29, Stefan Kanthak wrote: >> >> "Jakub Jelinek" wrote: >> >> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> >> 3) SSE4.1 is supported since Core2, but -marc

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >>That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appeared in th

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >>That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appeared in th

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 09:00, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: >> > >> >> On Thu, May 25, 2023 at 11:56?

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: > >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak >> wrote: >>> >>> Hi, >>> >>> compile the following function on a system with Core

Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
#ret 14 instructions in 33 bytes# 11 instructions in 32 bytes OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous memory write? Stefan Kanthak

Re: B^HDEAD code generation (AMD64)

2023-01-09 Thread Stefan Kanthak
"Thomas Koenig" wrote: > On 09.01.23 12:35, Stefan Kanthak wrote: >> 20 superfluous instructions of the total 102 instructions! > > The proper place for bug reports is https://gcc.gnu.org/bugzilla/ . OUCH: there's NO proper place for bugs at all! > Feel free to

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning" wrote: >> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak wrote: >> >> "Paul Koning" wrote: >> >>>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak wrote: >>>> >>>> Hi, >>>> >>&g

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning" wrote: >> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak wrote: >> >> Hi, >> >> GCC (and other C compilers too) support the widening multiplication >> of i386/AMD64 processors, but DON'T support their narrowing division: > > I

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
LIU Hao wrote: >在 2023/1/9 20:20, Stefan Kanthak 写道: >> Hi, >> >> GCC (and other C compilers too) support the widening multiplication >> of i386/AMD64 processors, but DON'T support their narrowing division: >> >> > > QWORD-DWORD division would change

Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
ret .end JFTR: dependent on the magnitude of the numbers and the processor it MIGHT be better to omit comparison and branch: there's a trade-öff between the latency of the (un-pipelined) division instruction and the latency of the conditional branch due to misprediction. Stefan Kanthak

EPIC optimiser failures (i386)

2023-01-09 Thread Stefan Kanthak
sub eax, DWORD PTR [esp+4] .endif setoah setzal sub al, ah # al = ZF - OF .if 0 cbw cwde .else movsx eax, al .endif ret Stefan Kanthak

B^HDEAD code generation (AMD64)

2023-01-09 Thread Stefan Kanthak
ed to modify ECX! cmovne rdx, rax cmovne rax, rsi ret .L9: mov rax, rsi mov rdx, rdi .L1: ret .L14: mov r8, r9 xor r9d, r9d mov rcx, r8 jmp .L4 20 superfluous instructions of the total

B^HDEAD code generation (i386)

2023-01-09 Thread Stefan Kanthak
pop edi pop ebp ret .L9: mov ebx, edi # Ouch: GCC likes to play shell games! mov ecx, esi # mov edx, ebx # mov eax, ecx # pop ebx pop esi pop edi pop

[PATCH] get rid of POPCOUNTCST* macros in libgcc2.c

2022-04-16 Thread Stefan Kanthak via Gcc-patches
Hi @ll, the "magic" constants 0x55...55, 0x33...33, 0x0f...0f and 0x01...01 used in the popcountsi2() and popcountdi2() functions defined in libgcc2.c are currently generated iterative via the 4 macros POPCOUNTCST, POPCOUNTCST8, POPCOUNTCST4 and POPCOUNTCST2 from 2 nibbles over 4 and 8 nibbles to

Re: On(c)e more: optimizer failure

2021-08-27 Thread Stefan Kanthak
Manuel López-Ibáñez wrote: > FWIW: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 Thanks. So this bug may soon have a driver's license in some countries... One more for the road: $ cat wtf.c double wtf(double x) { return sqrt(x * x); // can the square ever be negative? } $ gcc -m64

Re: On(c)e more: optimizer failure

2021-08-23 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/23/21 3:46 PM, Stefan Kanthak wrote: >> JFTR: do you consider your wild speculations to be on-topic here? > > I suppose I should apologize: I did not intend to make any accusations > here. No need to, I can stand a little heat. [...] > I

Re: On(c)e more: optimizer failure

2021-08-23 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/22/21 11:22 PM, Stefan Kanthak wrote: [ 2bugzilla | !2bugzilla ] >> You (and everybody else) if free to use GCC bugzilla. >> Everybody and me is but also free NOT to use GCC bugzilla. >> >> Stefan > > Yes, you are free not

Re: On(c)e more: optimizer failure

2021-08-22 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/21/21 10:19 PM, Stefan Kanthak wrote: >> Jakub Jelinek wrote: [...] >>> GCC doesn't do value range propagation of floating point values, not even >>> the special ones like NaNs, infinities, +/- zeros etc., and without that the &

Re: On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
Jakub Jelinek wrote: > On Sat, Aug 21, 2021 at 09:40:16PM +0200, Stefan Kanthak wrote: >> > I believe your example doesn't take into account that the values can be NaN >> > which compares false in all situations. >> >> That's a misbelief! >> Please noti

Re: On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
//godbolt.org/z/1ra7zcsnd Replace if (isnan(argx) || isnan(argy)) return argx + argy; with if ((argx != argx) || (argy != argy)) return argx + argy; then feed the changed snippet to compiler explorer again, with and without -ffast-math Stefan > --matt > > On Sat, Aug 21, 2021 a

On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
Hi, the following snippet is from the nextafter() function of --- repro.c --- #define Zero 0.0 double nextafter(double argx, double argy) { double z = argx; if (isnan(argx) || isnan(argy)) return argx + argy; if (argx == argy) return argx;

Re: 3rd deficiency (was: Superfluous branches due to insufficient flow analysis)

2021-08-14 Thread Stefan Kanthak
Gabriel Ravier wrote: Independent from the defunct flow analysis in the presence of NaNs, my example demonstrates another minor deficiency: know thy instruction set! See the comments in the assembly below. > On 8/13/21 8:58 PM, Stefan Kanthak wrote: >> Hi, >> >> compil

Re: Superfluous branches due to insufficient flow analysis

2021-08-14 Thread Stefan Kanthak
"Gabriel Ravier" wrote: Please don't FULL QUOTE! > On 8/13/21 8:58 PM, Stefan Kanthak wrote: >> Hi, >> >> compile the following naive implementation of nextafter() for AMD64: >> >> JFTR: ignore the aliasing casts, they don't matter here! >&g

Superfluous branches due to insufficient flow analysis

2021-08-13 Thread Stefan Kanthak
Hi, compile the following naive implementation of nextafter() for AMD64: JFTR: ignore the aliasing casts, they don't matter here! $ cat repro.c double nextafter(double from, double to) { if (to != to) return to;// to is NAN if (from != from) return from; //

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-07 Thread Stefan Kanthak
Joseph Myers wrote: > On Fri, 6 Aug 2021, Stefan Kanthak wrote: PLEASE DON'T STRIP ATTRIBUTION LINES: I did not write the following paragraph! >> > I don't know what the standard says about NaNs in this case, I seem to >> > remember that arithmetic instructions typically

Optimizer failure

2021-08-07 Thread Stefan Kanthak
Hi, for the function (really: ternary expressions) int dummy(int x) { #ifdef VARIANT x < 0 ? --x : x > 0 ? ++x : 0; #else x < 0 ? --x : x > 0 ? ++x : x; #endif } GCC 10.2.0 generates the following code targeting AMD64: testl %edi, %edi js .L0 leal

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Richard Biener wrote: > On August 6, 2021 4:32:48 PM GMT+02:00, Stefan Kanthak > wrote: >>Michael Matz wrote: >>> Btw, have you made speed measurements with your improvements? >> >>No. [...] >>If the constant happens to be present in L1 cache

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> > Hi, >> > >> > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: [...] >> >> The whole idea

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Michael Matz wrote: > Hello, > > On Fri, 6 Aug 2021, Stefan Kanthak wrote: > >> For -ffast-math, where the sign of -0.0 is not handled and the spurios >> invalid floating-point exception for |argument| >= 2**63 is acceptable, > > This claim would need to be p

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > Hi, > > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> >> > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> >> .intel_

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> Hi, >> >> targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the >> following code (13 instructions using 57 bytes, plus 4 quadwords >> using 32 bytes) fo

Suboptimal code generated for __buitlin_floor on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (19 instructions using 86 bytes, plus 6 quadwords using 48 bytes) for __builtin_floor() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (13 instructions using 57 bytes, plus 4 quadwords using 32 bytes) for __builtin_trunc() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_ceil on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (17 instructions using 78 bytes, plus 6 quadwords using 48 bytes) for __builtin_ceil() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_rint on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (12 instructions using 51 bytes, plus 4 quadwords using 32 bytes) for __builtin_rint() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Re: Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
"Joseph Myers" wrote: > On Fri, 30 Jul 2021, Stefan Kanthak wrote: > >> Joseph Myers wrote: >> >> > None of these are valid constant expressions as defined by the standard >> > (constant expressions cannot involve evaluated function calls). >

Re: Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
Joseph Myers wrote: > None of these are valid constant expressions as defined by the standard > (constant expressions cannot involve evaluated function calls). That's why I ask specifically why GCC bugs on log(log(...)), but not on log(sqrt(...) ...)! GCC also accepts following initializers

Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
on log(sqrt(5.0) * 0.5 + 0.5)! NOT amused Stefan Kanthak

Optimiser failure for ternary foo == 0L ? NULL : bar;

2021-07-17 Thread Stefan Kanthak
6 90 xchg %ax,%ax 10: 31 c0 xor%eax,%eax 12: c3 ret not amused Stefan Kanthak

Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-12-07 Thread Stefan Kanthak
Jeff Law wrote Wednesday, November 25, 2020 7:11 PM: > On 11/25/20 6:18 AM, Stefan Kanthak wrote: >> Jeff Law wrote: [...] >>> My inclination is to leave the overflow checking double-word multiplier >>> as-is. >> See but <https://gcc.gnu.org/piperm

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jeff Law wrote: [...] > By understanding how your proposed changes affect other processors, you > can write better changes that are more likely to get included. > Furthermore you can focus efforts on things that matter more in the real > world. DImode shifts in libgcc are _not_ useful to try

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jakub Jelinek wrote: > On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote: >> > As Jakub has already indicated, your change will result in infinite >> > recursion on avr.Ã, I happened to have a cr16 handy and it looks like >> > it'd generate infinite r

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jeff Law wrote: > On 11/24/20 8:40 AM, Stefan Kanthak wrote: >> Andreas Schwab wrote: >> >>> On Nov 24 2020, Stefan Kanthak wrote: >>> >>>> 'nuff said >>> What's your point? >> Pinpoint deficiencies and bugs in GCC and libgcc, plus a c

Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-25 Thread Stefan Kanthak
Jeff Law wrote: > On 11/10/20 10:21 AM, Stefan Kanthak wrote: > >>> So with all that in mind, I installed everything except the bits which >>> have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. >>> If you could remove the ifdefs on the a

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-24 Thread Stefan Kanthak
Andreas Schwab wrote: > On Nov 24 2020, Stefan Kanthak wrote: > >> 'nuff said > > What's your point? Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter example to your "argument"! I recommend careful reading. Stefan

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-24 Thread Stefan Kanthak
Andreas Schwab wrote 2020-11-11: > On Nov 10 2020, Stefan Kanthak wrote: > >> Eric Botcazou wrote: >> >>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() >>>> functions >>>> is rather bad, it yields bad machine cod

Re: [PATCH] Simplified construction of constants for __popcountSI2/__popcountDI2 in libgcc2.c

2020-11-20 Thread Stefan Kanthak
Jakub Jelinek wrote: > On Fri, Nov 20, 2020 at 11:08:41AM +0100, Stefan Kanthak wrote: >> The construction of the "magic" constants 0x55...55, 0x33...33, 0x0f...0f >> and 0x01...01 in __popcountSI2 and __popcountDI2 with macros is awkward; >> these constants can si

[PATCH] Simplified construction of constants for __popcountSI2/__popcountDI2 in libgcc2.c

2020-11-20 Thread Stefan Kanthak
The construction of the "magic" constants 0x55...55, 0x33...33, 0x0f...0f and 0x01...01 in __popcountSI2 and __popcountDI2 with macros is awkward; these constants can simply be written as ((UWtype) ~0 / 3), ((UWtype) ~0 / 5), ((UWtype) ~0 / 17) and ((UWtype) ~0 / 255) Stefan Kantha

[libgcc2.c] Implementation of __bswapsi2()

2020-11-12 Thread Stefan Kanthak
return (v >> (31 & w)) | (v << (31 & -w)); } int __bswapsi2 (int u) // should better be unsigned __bswapsi2 (unsigned u)! { return __rotlsi3 (u & 0xff00ff00, 8) | __rotrsi3 (u & 0x00ff00ff, 8); } Stefan KanthaK PS: reimplementing __bswapdi2() is left

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-11 Thread Stefan Kanthak
000ff is signed too -- and producing a negative value (or overflow) from the left-shift of a signed int, i.e. shifting into (or beyond) the sign bit, is undefined behaviour! JFTR: both -fsanitize=signed-integer-overflow and -fsanitize=undefined fail to catch this BUGBUGBUG, which surfaces on i386 and AMD64 with -O1 or -O0! Stefan Kanthak PS: even worse, -fsanitize=signed-integer-overflow fails to catch 1 << 31 or 128 << 24!

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Stefan Kanthak
u]subDI3() functions ... which are but missing from libgcc.a Stefan Kanthak

[PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-10 Thread Stefan Kanthak via Gcc-patches
() function uses SIGNED instead of unsigned mask values; cf. __bswapdi2() Stefan Kanthak libgcc2.diff Description: Binary data

Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-10 Thread Stefan Kanthak
Jeff Law wrote: > On 10/5/20 10:49 AM, Stefan Kanthak wrote: >> The implementation of the functions __absv?i2(), __addv?i3() etc. for >> trapping integer overflow provided in libgcc2.c is rather bad. >> Same for __cmp?i2() and __ucmp?i2() >> >> At least for AMD6

Re: [__mulvti3] register allocator plays shell game

2020-10-27 Thread Stefan Kanthak
Richard Biener wrote: > On Tue, Oct 27, 2020 at 12:01 AM Stefan Kanthak > wrote: >> >> Richard Biener wrote: >> >>> On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak >>> wrote: >>>> >>>> Hi, >>>> >>>>

Re: [__mulvti3] register allocator plays shell game

2020-10-26 Thread Stefan Kanthak
Richard Biener wrote: > On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak > wrote: >> >> Hi, >> >> for the AMD64 alias x86_64 platform and the __int128_t [DW]type, >> the first few lines of the __mulvDI3() function from libgcc2.c >> >

[__mulvti3] register allocator plays shell game

2020-10-25 Thread Stefan Kanthak
, 63 cmp r8, rsi jne __mulvti3+0x48+65-31 cmp r9, rcx jne __mulvti3+0xa0+65-31 mov rax, rdi imul rdx ret ... not amused Stefan Kanthak

[PATCH] Overflow-trapping integer arithmetic routines7code

2020-10-05 Thread Stefan Kanthak
The implementation of the functions __absv?i2(), __addv?i3() etc. for trapping integer overflow provided in libgcc2.c is rather bad. Same for __cmp?i2() and __ucmp?i2() At least for AMD64 and i386 processors GCC creates awful to horrible code for them: see

[Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Stefan Kanthak
The implementation of the functions __absv?i2(), __addv?i3() etc. for trapping integer overflow provided in libgcc2.c is rather bad. Same for __cmp?i2() and __ucmp?i2() GCC creates awful to horrible code for them (at least for AMD64 and i386 processors): see

UB or !UB? Plus poor register allocation

2020-10-01 Thread Stefan Kanthak
The following source implements the __absv?i2() functions (see <https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>) for 32-bit, 64-bit and 128-bit integers in 3 different ways: --- ub_or_!ub.c --- // Copyleft 2014-2020, Stefan Kanthak #ifdef __amd64__ __int128_t __a

Missed optimisation in __udivmoddi4 of libgcc2

2020-09-13 Thread Stefan Kanthak
libgcc2 provides "double-word" division as __udivmoddi4() The following part of its source | UWtype d0, d1, n0, n1, n2; | UWtype b, bm; ... | count_leading_zeros (bm, d1); | if (bm == 0) ... | else | { | UWtype m1, m0; | /* Normalize. */ | | b = W_TYPE_SIZE - bm;

Re: Peephole optimisation: isWhitespace()

2020-08-25 Thread Stefan Kanthak
I wrote: > "Richard Biener" wrote: [...] >> Whether or not the branch is predicted taken does not matter, what >> matters is that the continuation is not data dependent on the branch >> target computation and thus can execute in parallel to it. > > My benchmark shows that this doesn't

Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Stefan Kanthak
"Richard Biener" wrote: > On Mon, Aug 24, 2020 at 1:22 PM Stefan Kanthak > wrote: >> >> "Richard Biener" wrote: >> >> > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak >> > wrote: >> >> >> >> "Al

Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Stefan Kanthak
"Richard Biener" wrote: > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak > wrote: >> >> "Allan Sandfeld Jensen" wrote: >> >> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: >> >> Hi @ll, >> >>

Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Allan Sandfeld Jensen" wrote: > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: >> Hi @ll, >> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>, >> Matt Godbolt used the function >> >> | b

Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Nathan Sidwell" > On 8/16/20 9:54 AM, Stefan Kanthak wrote: >> "Nathan Sidwell" wrote: [...] >>> Have you benchmarked it? >> >> Of course! Did you? [...] > you seem very angry about being asked for data. As much as you hallucinated

Re: Peephole optimisation: isWhitespace()

2020-08-16 Thread Stefan Kanthak
"Nathan Sidwell" wrote: > On 8/14/20 12:43 PM, Stefan Kanthak wrote: >> Hi @ll, >> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>, >> Matt Godbolt used the function >> >> | bool isWhitespace(char c)

Peephole optimisation: isWhitespace()

2020-08-14 Thread Stefan Kanthak
x, edx cmpecx, 33 ; CF = c <= ' ' adcedx, edx ; edx = (c <= ' ') andeax, edx ret regards Stefan Kanthak

Almost an order of magnitude faster __udimodti4() for AMD64

2020-08-10 Thread Stefan Kanthak
Hi @ll, I don't use GCC, so I don't know whether there's a benchmark for __udivmodti4() and/or __udivmoddi4() for AMD64 and i386 processors. If you have one: get my "slow" __udivmodti4() from and run the benchmark, then my fast

Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-06 Thread Stefan Kanthak
"Segher Boessenkool" wrote: > On Wed, Dec 05, 2018 at 02:19:14AM +0100, Stefan Kanthak wrote: >> "Paul Koning" wrote: >> >> > Yes, that's a rather nasty cut & paste error I made. >> >> I suspected that. >> Replacing >>

Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
ets do that. Moving 2 of 3 conditions from the loop is not an optimisation, but a necessity! In other words: why test 3 conditions in every pass of the loop when you need to test only 1 condition inside the loop, and the other 2 outside/before the loop? regards Stefan > On Dec 4, 2018, at 5:5

Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
Hi @ll, libgcc's divmodhi4() function has an obvious bug; additionally it shows rather poor inperformant code: two of the three conditions tested in the first loop should clearly moved outside the loop! divmodsi4() shows this inperformant code too! regards Stefan Kanthak --- divmodhi4.c

Poor code generation/optimisation in all versions of GCC x86-64 and x86-32

2018-11-05 Thread Stefan Kanthak
quot;mov rdx, rcx". I also wonder why a shld is created here: at least for "n += n;" I expect a more straightforward add rax, rax adc rdx, rdx regards Stefan Kanthak PS: of course GCC x86-32 exhibits the same flaws with int64_t!