[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #4 from Jiu Fu Guo --- Thanks, Richard! One interesting thing: below code is vectorized: void foo (const double *__restrict__ A, const double *__restrict__ B, double *__restrict__ C, int n, int k, int m) { if (n > 0 && m > 0

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #2 from Jiu Fu Guo --- For code: for (unsigned int k = 0; k < BS; k++) { s += A[k] * B[k]; } PR48052 handles this, and for this code, the additional runtime check seems not required. If there is offset in code:

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #1 from Jiu Fu Guo --- Since there are additional costs for the run-time check, we can see the benefit if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code (from clang) is worse than un-vectorized binary.

[Bug tree-optimization/98813] New: loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 Bug ID: 98813 Summary: loop is sub-optimized if index is unsigned int with offset Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2021-01-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #15 from Jiu Fu Guo --- (In reply to Richard Biener from comment #14) > > I've only quickly tried to understand what you are proposing but I think > this is out-of scope of our "separate" distribution / interchange / >

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #13 from Jiu Fu Guo --- Hi Richard, As checking the changed code as in comment 9, it seems there is another opportunity to improve the performance: By improving locality of array A usage. Unroll and jam loop1 into loop4 (or unroll

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #11 from Jiu Fu Guo --- And the patch(PR98137) also helps a lot for the code in comment 9, since vectorization happens.

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug tree-optimization/97901] ICE at -Os: verify_gimple failed

2020-11-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97901 --- Comment #4 from Jiu Fu Guo --- Hi Richard, thank you to handle this!

[Bug rtl-optimization/66706] Redundant bitmask instruction on x >> (n & 32)

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66706 Bug 66706 depends on bug 66552, which changed state. Bug 66552 Summary: Missed optimization when shift amount is result of signed modulus https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552 What|Removed |Added

[Bug rtl-optimization/66552] Missed optimization when shift amount is result of signed modulus

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552 Jiu Fu Guo changed: What|Removed |Added Status|NEW |RESOLVED CC|

<    1   2   3