https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #4 from Jiu Fu Guo ---
Thanks, Richard!
One interesting thing: below code is vectorized:
void
foo (const double *__restrict__ A, const double *__restrict__ B,
double *__restrict__ C, int n, int k, int m)
{
if (n > 0 && m > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #2 from Jiu Fu Guo ---
For code:
for (unsigned int k = 0; k < BS; k++)
{
s += A[k] * B[k];
}
PR48052 handles this, and for this code, the additional runtime check seems not
required.
If there is offset in code:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #1 from Jiu Fu Guo ---
Since there are additional costs for the run-time check, we can see the benefit
if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code
(from clang) is worse than un-vectorized binary.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
Bug ID: 98813
Summary: loop is sub-optimized if index is unsigned int with
offset
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
--- Comment #15 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #14)
>
> I've only quickly tried to understand what you are proposing but I think
> this is out-of scope of our "separate" distribution / interchange /
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
--- Comment #13 from Jiu Fu Guo ---
Hi Richard,
As checking the changed code as in comment 9, it seems there is another
opportunity to improve the performance: By improving locality of array A
usage.
Unroll and jam loop1 into loop4 (or unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
--- Comment #11 from Jiu Fu Guo ---
And the patch(PR98137) also helps a lot for the code in comment 9, since
vectorization happens.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
Jiu Fu Guo changed:
What|Removed |Added
CC||guojiufu at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97901
--- Comment #4 from Jiu Fu Guo ---
Hi Richard, thank you to handle this!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66706
Bug 66706 depends on bug 66552, which changed state.
Bug 66552 Summary: Missed optimization when shift amount is result of signed
modulus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552
Jiu Fu Guo changed:
What|Removed |Added
Status|NEW |RESOLVED
CC|
201 - 211 of 211 matches
Mail list logo