[Bug middle-end/113921] Output register of an "asm volatile goto" is incorrectly clobbered/discarded

2024-02-14 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 --- Comment #5 from Linus Torvalds --- (In reply to Linus Torvalds from comment #2) > > So we could make our workaround option be something like > >config GCC_ASM_GOTO_WORKAROUND > def_bool y > depends on CC_IS_GCC && GCC_VERSION

[Bug middle-end/113921] Output register of an "asm volatile goto" is incorrectly clobbered/discarded

2024-02-14 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 --- Comment #3 from Linus Torvalds --- (In reply to Linus Torvalds from comment #2) > > So we could make our workaround option be something like > >config GCC_ASM_GOTO_WORKAROUND > def_bool y > depends on CC_IS_GCC && GCC_VERSION

[Bug middle-end/113921] Output register of an "asm volatile goto" is incorrectly clobbered/discarded

2024-02-14 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 --- Comment #2 from Linus Torvalds --- (In reply to Jakub Jelinek from comment #1) > Bisection points to r12-5301-g045206450386bcd774db3bde0c696828402361c6 > making the problem go away, Well, that certainly explains why I can't see the problem

[Bug rtl-optimization/111901] Apparently bogus CSE of inline asm with memory clobber

2023-10-20 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901 --- Comment #3 from Linus Torvalds --- (In reply to Andrew Pinski from comment #1) > I suspect without an input, the cse will happen as there is no other writes > in the loop. Yes, it looks to me like the CSE simply didn't think of the memory c

[Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber

2023-10-20 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901 Bug ID: 111901 Summary: Apparently bogus CSE of inline asm with memory clobber Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Co

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-30 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #47 from Linus Torvalds --- (In reply to Richard Biener from comment #45) > For user code > > volatile long long x; > void foo () { x++; } > > emitting inc + adc with memory operands is only "incorrect" in re-ordering > the subword

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-30 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #43 from Linus Torvalds --- (In reply to Richard Biener from comment #42) > > I think if we want to avoid doing optimizations on gcov counters we should > make them volatile. Honestly, that sounds like the cleanest and safest opti

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-27 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #32 from Linus Torvalds --- Brw, where does the -fprofile-update=single/atomic come from? The kernel just uses CFLAGS_GCOV:= -fprofile-arcs -ftest-coverage for this case. So I guess 'single' is just the default value?

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-27 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #31 from Linus Torvalds --- (In reply to Richard Biener from comment #26) > > Now, in principle we should have applied store-motion and not only PRE which > would have avoided the issue, not tricking the RA into reloading the value

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-27 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #30 from Linus Torvalds --- (In reply to Richard Biener from comment #26) > And yes, to IV optimization the gcov counter for the loop body is just > another IV candidate that can be used, and in this case it allows to elide > the oth

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #12 from Linus Torvalds --- So it might be worth pointing explicitly to Vlastimil's email at https://lore.kernel.org/all/2b857e20-5e3a-13ec-a0b0-1f69d2d04...@suse.cz/ which has annotated objdump output and seems to point to the a

[Bug target/106471] Strange code generation for __builtin_ctzl()

2022-07-28 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471 --- Comment #6 from Linus Torvalds --- Ahh, crossed comments. (In reply to Andrew Pinski from comment #3) > The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting: > > /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency >for bi

[Bug target/106471] Strange code generation for __builtin_ctzl()

2022-07-28 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471 --- Comment #5 from Linus Torvalds --- (In reply to Andrew Pinski from comment #2) > The xor is needed because of an errata in some Intel cores. The only errata I'm aware of is that tzcnt can act as tzcnt even when cpuid doesn't enumerate it (s

[Bug c/106471] Strange code generation for __builtin_ctzl()

2022-07-28 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471 --- Comment #1 from Linus Torvalds --- Created attachment 53379 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53379&action=edit Silly test-case as an attachment too I expected just rep bsfq %rdi, %rax ret from this, but

[Bug c/106471] New: Strange code generation for __builtin_ctzl()

2022-07-28 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471 Bug ID: 106471 Summary: Strange code generation for __builtin_ctzl() Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-07-10 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #28 from Linus Torvalds --- (In reply to Roger Sayle from comment #27) > This should now be fixed on both mainline and the GCC 12 release branch. Thanks everybody. Looks like the xchg optimization isn't in the gcc-12 release branch

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-24 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #24 from Linus Torvalds --- (In reply to Linus Torvalds from comment #23) > > And this now brings back my memory of the earlier similar discussion - it > wasn't about DImode code generation, it was about bitfield code generation > b

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-24 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #23 from Linus Torvalds --- (In reply to Jakub Jelinek from comment #22) > > If the wider registers are narrowed before register allocation, it is just > a pair like (reg:SI 123) (reg:SI 256) and it can be allowed anywhere. That wa

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-24 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #21 from Linus Torvalds --- (In reply to CVS Commits from comment #20) > > One might think > that splitting early gives the register allocator more freedom to > use available registers, but in practice the constraint

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-13 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #14 from Linus Torvalds --- (In reply to Samuel Neves from comment #13) > Something simple like this -- https://godbolt.org/z/61orYdjK7 -- already > exhibits the effect. Yup. That's a much better test-case. I think you should atta

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-13 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #12 from Linus Torvalds --- (In reply to Jakub Jelinek from comment #11) > Anyway, I think we need to understand what makes it spill that much more, > and unfortunately the testcase is too large to find that out easily, I think > we

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-12 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #10 from Linus Torvalds --- (In reply to Roger Sayle from comment #7) > Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and > on godbolt the number of assembler lines reduces from 6952 to 6203). So now that

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-12 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #9 from Linus Torvalds --- Looks like STV is "scalar to vector" and it should have been disabled automatically by the -mno-avx flag anyway. And the excessive stack usage was perhaps due to GCC preparing all those stack slots for int

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-12 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #8 from Linus Torvalds --- (In reply to Roger Sayle from comment #7) > Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and > on godbolt the number of assembler lines reduces from 6952 to 6203). Thanks. Using

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-11 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #5 from Linus Torvalds --- (In reply to Linus Torvalds from comment #4) > > I'm not proud of that hacky thing, but since gcc documentation is written > in sanskrit, and mere mortals can't figure it out, it's the best I could do. A

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-11 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #4 from Linus Torvalds --- So hey, since you guys use git now, I thought I might as well just bisect this. Now, I have no idea what the best and most efficient way is to generate only "cc1", so my bisection run was this unholy mess

[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86

2022-06-11 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #3 from Linus Torvalds --- Created attachment 53123 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53123&action=edit Mindless revert that fixes things for me

[Bug target/105930] Excessive stack spill generation on 32-bit x86

2022-06-11 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 --- Comment #1 from Linus Torvalds --- Side note: it might be best to clarify that this is a regression specific to gcc-12. Gcc 11.3 doesn't have the problem, and generates code for this same test-case with a stack frame of only 428 bytes. That

[Bug c/105930] New: Excessive stack spill generation on 32-bit x86

2022-06-11 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 Bug ID: 105930 Summary: Excessive stack spill generation on 32-bit x86 Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 --- Comment #14 from Linus Torvalds --- (In reply to Vineet Gupta from comment #13) > Sorry the workaround proposed by Alexander doesn't seem to cure it (patch > attached), outcome is the same Vineet - it's not the ldd/std that is necessarily b

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 --- Comment #11 from Linus Torvalds --- (In reply to Linus Torvalds from comment #10) > > This particular code comes > from some old version of zlib, and I can't test because I don't have the ARC > background to make any sense of the gene

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 --- Comment #10 from Linus Torvalds --- (In reply to Richard Biener from comment #9) > > Note alignment has nothing to do with strict-aliasing (-fno-strict-aliasing > you mean btw). I obviously meant -fno-strict-aliasing, yes. But I think it'

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-01 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 --- Comment #8 from Linus Torvalds --- (In reply to Alexander Monakov from comment #7) > > Most likely the issue is that sout/sfrom are misaligned at runtime, while > the vectorized code somewhere relies on them being sufficiently aligned for >

[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread torvalds--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 Linus Torvalds changed: What|Removed |Added CC||torvalds@linux-foundation.o