[Bug tree-optimization/114322] New: [14 Regression] SCEV analysis failed for bases like A[(i+x)*stride] since r14-9193-ga0b1798042d033

2024-03-13 Thread hliu at amperecomputing dot com via Gcc-bugs
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- Compile the following case with: gcc simp.c -Ofast -mcpu=neoverse-n1 -S

[Bug testsuite/113446] [14 Regression] gcc.dg/tree-ssa/scev-16.c FAILs

2024-01-18 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113446 --- Comment #6 from Hao Liu --- Hi Jakub, That's great. Thanks for the fix.

[Bug target/110625] [14 Regression][AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-12-30 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #26 from Hao Liu --- But for now, the patch should fix the regression.(In reply to Tamar Christina from comment #25) > Is still pretty inefficient due to all the extends. If we generate better > code here this may tip the scale back

[Bug target/113089] New: [14 Regression][aarch64] ICE in process_uses_of_deleted_def, at rtl-ssa/changes.cc:252 since r14-6605-gc0911c6b357ba9

2023-12-19 Thread hliu at amperecomputing dot com via Gcc-bugs
Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- SPEC2017 525.x264 build failure. Options are: -O3 -mcpu

[Bug tree-optimization/112774] New: Vectorize the loop by inferring nonwrapping information from arrays

2023-11-30 Thread hliu at amperecomputing dot com via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- This case extracted from another benchmark and it is simpler than the case in PR101450, as it has the additional

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #19 from Hao Liu --- > Hi, here's the reduced case Hi Tarmar, thanks for the case. I've modified it to reproduce the ICE without LTO and have updated the patch.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #17 from Hao Liu --- > Thanks! I can reduce a testcase for you if you want :) That will be very helpful. Thanks.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #15 from Hao Liu --- Ah, I see. I've sent out a quick fix patch for code review. I'll investigate more about this and find out the root cause.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-30 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #11 from Hao Liu --- Hi Richard, That's great! Glad to hear the status. Waiting for the patches to be ready and upstreamed to trunk.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-19 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #8 from Hao Liu --- Thanks for the explanation. Understood the root cause and that's reasonable. So, do you have plan to fix this (i.e. to separate the FP and integer types)? I want to enable the new costs for Ampere1, which is sim

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-18 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #6 from Hao Liu --- Thanks for the confirmation about the reduction latency. I'll create a simple patch to fix this. > Discounting the loads, we do have 15 general operations. That's true, and there are indeed 8 general operations

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-14 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #3 from Hao Liu --- Sorry, it seems this case can not be fixed by only adjusting the calculation of "reduction latency". Even it becomes smaller, the case still can not be vectorized as the "general operations" count is still too la

[Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)

2023-07-14 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649 --- Comment #2 from Hao Liu --- Hi, I bisected the following 3 commits (sequantial): [v3] 3a61ca1b925 - Improve profile updates after loop-ch and cunroll (2023-07-06) [v2] d4c2e34deef - Improve scale_loop_profile (2023-07-06) [v1] 224fd

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-11 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #2 from Hao Liu --- To my understanding, "reduction latency" is the least number of cycles needed to do the reduction calculation for 1 iteration of loop. It is calcualted by the extra instruction issue-info of the new cost models i

[Bug target/110625] New: [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-11 Thread hliu at amperecomputing dot com via Gcc-bugs
: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- This problem causes a performance regression in SPEC2017 538.imagick. For the

[Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization

2023-07-05 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Hao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 Hao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #10 from Hao Liu --- > foo is just an example for not getting inlined, the point here is extra cost > paid. My point is that the case is different from the original case in tree-vect-loop.cc. For example, change the case as follow

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-04 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #7 from Hao Liu --- > int foo() { > bool a = true; > bool b; > if (a || b) > return 1; > b = true; > return 0; > } > > still has the warning, it looks something can be improved (guess we prefer > not to emit warning).

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #5 from Hao Liu --- BTW, there is no warning is probably because the original code is too complicated and not inlined. Compile the simple case by "g++ -O3 -S -Wall hello.c": int foo(bool a) { bool b; if (a || b) return 1;

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #4 from Hao Liu --- > IMHO, the initialization with false is unnecessary and very likely it isn't > able to get optimized, it seems worse from this point of view. Sorry. I don't think so. See more at https://www.oreilly.com/library

[Bug tree-optimization/110531] Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531 --- Comment #2 from Hao Liu --- > Is the warning from some static analyzer? No. I just find it maybe a bug while looking at the code. > slp should be true always (always do analyze slp), it doesn't care what's in > slp_done_for_suggested_uf.

[Bug tree-optimization/110531] New: Vect: slp_done_for_suggested_uf is not initialized in tree-vect-loop.cc

2023-07-03 Thread hliu at amperecomputing dot com via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- This seems an obvious bug in tree-vect-loop.cc: (1) This var is declared (but not initialized) and used in

[Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- Hi, I'm trying to use tune loop unrolling during vectorization (see more: tree

[Bug tree-optimization/110449] Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449 --- Comment #2 from Hao Liu --- That looks better than the currently generated code (it saves one "MOV" instruction). Yes, it has the loop-carried dependency advantage. But it still uses one more register for "8*step" (There may be a register pr

[Bug tree-optimization/110449] New: Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization

2023-06-28 Thread hliu at amperecomputing dot com via Gcc-bugs
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- This is inspired by clang. Compile the follwing case with "-mcpu=neover

[Bug tree-optimization/98598] New: Missed opportunity to optimize dependent loads in loops

2021-01-08 Thread hliu at amperecomputing dot com via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- As we know, dependent loads are not friendly to cache. Especially when in nested loops, dependent loads such as pa->pb-

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-22 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #8 from Hao Liu --- Hi Nathan, The problem is related to use another make binary, which is 4.2.0 and built by ourselves. Maybe there is a strange bug. Anyway, after using the system installed make (which is 4.2.1 and under /usr/bin/

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-22 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #7 from Hao Liu --- I found that: 1. "make -j1" can pass, but "make -j8" always fails. It seems something wrong with parallel build 2. When "make -j8" failed, if I try "make -j8" again, it can pass. > What happens if you cd into

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-21 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 --- Comment #5 from Hao Liu --- Hi Nanthan, We can still reprodcue this problem on CentOS 7 (X86) and CentOS 8.2 (AArch64). We use last GCC version of yesterday:108beb75da The configure and build commands are (Bash is used): $ ../gcc/configure

[Bug bootstrap/98318] libcody breaks DragonFly bootstrap

2020-12-21 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318 Hao Liu changed: What|Removed |Added CC||hliu at amperecomputing dot com --- Comment

[Bug ipa/92471] New: [ICE] segmentation fault in ipa-profile.c ipa_get_cs_argument_count (args=0x0)

2019-11-12 Thread hliu at amperecomputing dot com
: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com CC: marxin at gcc dot gnu.org Target Milestone: --- GCC failed to compile spec2017 523.xalancbmk_r. The command line and

[Bug tree-optimization/91573] Vectorization failure for a loop to do multiply-add because SLP loads unnecessarily require permutation

2019-08-28 Thread hliu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573 --- Comment #5 from Hao Liu --- Great. It seems really a SLP issue. I've learnt a lot about vectorization, dump info and -march. Thanks for your help.

[Bug tree-optimization/91573] New: Vectorization failure for a loop to do multiply-add

2019-08-28 Thread hliu at amperecomputing dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- The following code can not be vectorized ( compiling with gcc -O3 ): === begin code === char src[512]; char dst[512]; #define WIDTH 8 void foo(int

[Bug ipa/91374] [Missed optimization] Versioning opportunities to improve performance

2019-08-07 Thread hliu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91374 --- Comment #2 from Hao Liu --- (In reply to Richard Biener from comment #1) > So you ask for main to be converted to > > if (idx == 0) >foo_32_16 (); > else /* idx == 1 */ >foo_16_8 (); > > correct? It shoulds like an interesting id

[Bug tree-optimization/91374] New: [Missed optimization] Versioning opportunities to improve performance

2019-08-05 Thread hliu at amperecomputing dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- Consider the following code === begin code === #define LENGTH 512 #define STRIDE 32 char src[LENGTH]; char

[Bug tree-optimization/88492] SLP optimization generates ugly code

2019-07-12 Thread hliu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492 Hao Liu changed: What|Removed |Added CC||hliu at amperecomputing dot com --- Comment