[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 --- Comment #16 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:ece83c6a5aa7ac1cc68d3941797a25601293f310 commit r16-6578-gece83c6a5aa7ac1cc68d3941797a25601293f310 Author: Richard Sandiford Date: Fri Dec 12 17:40:54 2025 + vect: Generalise vect_add_slp_permutation [PR122793] The problem seems to be with a packing permutation: op0[4] op0[5] op0[6] op0[7] and with the identity_offset parameter to vect_add_slp_permutation. Both the repeating_p and !repeating_p paths correctly realise that this permutation reduces to an identity. But the !repeating_p path ends up with first_node and second_node both set to the second VEC_PERM_EXPR operand (since that path works elementwise, and since no elements are taken from the first input). Therefore, the call: vect_add_slp_permutation (vinfo, gsi, node, first_def, second_def, mask_vec, mask[0]); works regardless of whether vect_add_slp_permutation picks first_def or second_def. In that sense, the parameters to vect_add_slp_permutation are already âcanonicalâ. The repeating_p path instead passes vector 2N as first_def and vector 2N+1 as second_def, with mask[0] indicating the position of the identity within the concatenation of first_def and second_def. However, vect_add_slp_permutation doesn't expect this and instead ignores the identity_offset parameter. PR tree-optimization/122793 * tree-vect-slp.cc (vect_add_slp_permutation): Document the existing identity_offset parameter. Handle identities that take from the second input rather than the first. * gcc.dg/vect/vect-pr122793.c: New testcase. Co-authored-by: Richard Biener
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #15 from Richard Sandiford ---
(In reply to Richard Biener from comment #14)
> (In reply to Richard Biener from comment #13)
> OK, so this miscompiles
>
> void __attribute__((noipa)) test_hi (v4si *dst, v8si src)
> {
> (*dst)[0] = src[4];
> (*dst)[1] = src[5];
> (*dst)[2] = src[6];
> (*dst)[3] = src[7];
> }
>
> because for highpart/lowpart extraction we pass first_def == second_def
> but the can_div_trunc computes to use the seconmd_def and thinks the offset
> is accounted for - it divides 4 (the offset) by 4 (nunits of the result).
> In this case it should divide by 8 (nunits of first_def) I think.
>
> Adjusting accordingly and re-testing.
Oops, yes. Thanks for taking care of it.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #14 from Richard Biener ---
(In reply to Richard Biener from comment #13)
> (In reply to Richard Biener from comment #12)
> > I'm testing the patch, it looks reasonable. I'll note the initial support
> > was for very limited cases of lowpart or concat but as written the checks
> > would cover more cases, so it's somewhat a bad design.
>
> The patch causes
>
> +FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test
> +FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test
> +FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects execution test
> +FAIL: gcc.dg/vect/slp-28.c execution test
> +FAIL: gcc.dg/vect/slp-45.c execution test
>
> (and a few more), will investigate another day.
OK, so this miscompiles
void __attribute__((noipa)) test_hi (v4si *dst, v8si src)
{
(*dst)[0] = src[4];
(*dst)[1] = src[5];
(*dst)[2] = src[6];
(*dst)[3] = src[7];
}
because for highpart/lowpart extraction we pass first_def == second_def
but the can_div_trunc computes to use the seconmd_def and thinks the offset
is accounted for - it divides 4 (the offset) by 4 (nunits of the result).
In this case it should divide by 8 (nunits of first_def) I think.
Adjusting accordingly and re-testing.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 --- Comment #13 from Richard Biener --- (In reply to Richard Biener from comment #12) > I'm testing the patch, it looks reasonable. I'll note the initial support > was for very limited cases of lowpart or concat but as written the checks > would cover more cases, so it's somewhat a bad design. The patch causes +FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test +FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test +FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects execution test +FAIL: gcc.dg/vect/slp-28.c execution test +FAIL: gcc.dg/vect/slp-45.c execution test (and a few more), will investigate another day.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #12 from Richard Biener --- I'm testing the patch, it looks reasonable. I'll note the initial support was for very limited cases of lowpart or concat but as written the checks would cover more cases, so it's somewhat a bad design.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #11 from Richard Sandiford ---
Created attachment 63045
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63045&action=edit
candidate patch
(In reply to Tamar Christina from comment #10)
> (In reply to Jakub Jelinek from comment #7)
> > Seems it is the pack_p case,
> > Commenting out
> > /* Check whether the input has twice as many lanes per vector. */
> > else if (children.length () == 1
> >&& known_eq (SLP_TREE_LANES (child) * nunits,
> > SLP_TREE_LANES (node) * op_nunits * 2))
> > pack_p = true;
> > makes the #c5 testcase pass, while commenting out
> > /* Check whether the output has N times as many lanes per vector. */
> > else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
> > SLP_TREE_LANES (child) * nunits,
> > &this_unpack_factor)
> >&& (i == 0 || unpack_factor == this_unpack_factor))
> > unpack_factor = this_unpack_factor;
> > instead doesn't fix it.
>
> Yes, the change in r15-4114-g8157f3f2d211bf has a bug in that repeated_p is
> initialized to true, but after this change it's only set to false when
> !pack_p && !widen.
>
> so repeated_p stays true even when the vector isn't repeating, but
> repeated_p takes precedence over pack_p.
>
> As a result it thinks that this pack operation replicates as a sequence from
> the wider vector:
>
> (rr) p debug (node)
> perm.c:18:1: note: node 0x580a950 (max_nunits=1, refcnt=1) vector(16)
> unsigned char
> perm.c:18:1: note: op: VEC_PERM_EXPR
> perm.c:18:1: note: stmt 0 _8 = MEM[(unsigned char *)s_25 + 5B];
> perm.c:18:1: note: stmt 1 _9 = MEM[(unsigned char *)s_25 + 6B];
> perm.c:18:1: note: stmt 2 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 3 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 4 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 5 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 6 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 7 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: lane permutation { 0[7] 0[8] 0[9] 0[9] 0[9] 0[9]
> 0[9] 0[9] }
> perm.c:18:1: note: children 0x580a7a0
> $1 = void
> (rr) p debug (child)
> perm.c:18:1: note: node 0x580a7a0 (max_nunits=16, refcnt=4) vector(16)
> unsigned char
> perm.c:18:1: note: op template: _6 = MEM[(unsigned char *)s_25 + -2B];
> perm.c:18:1: note: stmt 0 _6 = MEM[(unsigned char *)s_25 + -2B];
> perm.c:18:1: note: stmt 1 ---
> perm.c:18:1: note: stmt 2 ---
> perm.c:18:1: note: stmt 3 ---
> perm.c:18:1: note: stmt 4 ---
> perm.c:18:1: note: stmt 5 ---
> perm.c:18:1: note: stmt 6 _12 = MEM[(unsigned char *)s_25 + 4B];
> perm.c:18:1: note: stmt 7 _8 = MEM[(unsigned char *)s_25 + 5B];
> perm.c:18:1: note: stmt 8 _9 = MEM[(unsigned char *)s_25 + 6B];
> perm.c:18:1: note: stmt 9 _13 = MEM[(unsigned char *)s_25 + 7B];
> perm.c:18:1: note: stmt 10 _21 = MEM[(unsigned char *)s_25 + 8B];
> perm.c:18:1: note: stmt 11 _29 = MEM[(unsigned char *)s_25 + 9B];
> perm.c:18:1: note: stmt 12 ---
> perm.c:18:1: note: stmt 13 ---
> perm.c:18:1: note: stmt 14 ---
> perm.c:18:1: note: stmt 15 ---
>
> The code and the comments indicate to me that it was intended to support
> repeated unpacks, but not repeated packs.
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index a5cd596fd28..7104835eb5a 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -10242,7 +10242,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo,
> gimple_stmt_iterator *gsi,
>if (children.length () == 1
> && known_eq (SLP_TREE_LANES (child) * nunits,
>SLP_TREE_LANES (node) * op_nunits * 2))
> - pack_p = true;
> + {
> + pack_p = true;
> + repeating_p = false;
> + }
>/* Check whether the output has N times as many lanes per vector. */
>else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
> SLP_TREE_LANES (child) * nunits,
>
> fixes it, since if we're packing, we're not repeating the original vector.
> Testing the above change.
That doesn't look right. !repeating_p is the general non-VLA case, so it
doesn't need to handle packs differently from other types of permutation. That
is, setting repeating_p to false whenever pack_p is true should be equivalent
to setting repeating_p to false and removing the pack_p code entirely. Both
would have the effect of removing the VLA support for the pack case.
The problem seems to be with a packing permutation:
op0[4] op0[5] op0[6] op0[7]
and with the identity_offset parameter to vect_add_slp_permutation (added in
g:25f831eab368d1bbec4dc67bf058cb7cf6b721ee).
Bot
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #10 from Tamar Christina ---
(In reply to Jakub Jelinek from comment #7)
> Seems it is the pack_p case,
> Commenting out
> /* Check whether the input has twice as many lanes per vector. */
> else if (children.length () == 1
>&& known_eq (SLP_TREE_LANES (child) * nunits,
> SLP_TREE_LANES (node) * op_nunits * 2))
> pack_p = true;
> makes the #c5 testcase pass, while commenting out
> /* Check whether the output has N times as many lanes per vector. */
> else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
> SLP_TREE_LANES (child) * nunits,
> &this_unpack_factor)
>&& (i == 0 || unpack_factor == this_unpack_factor))
> unpack_factor = this_unpack_factor;
> instead doesn't fix it.
Yes, the change in r15-4114-g8157f3f2d211bf has a bug in that repeated_p is
initialized to true, but after this change it's only set to false when !pack_p
&& !widen.
so repeated_p stays true even when the vector isn't repeating, but repeated_p
takes precedence over pack_p.
As a result it thinks that this pack operation replicates as a sequence from
the wider vector:
(rr) p debug (node)
perm.c:18:1: note: node 0x580a950 (max_nunits=1, refcnt=1) vector(16) unsigned
char
perm.c:18:1: note: op: VEC_PERM_EXPR
perm.c:18:1: note: stmt 0 _8 = MEM[(unsigned char *)s_25 + 5B];
perm.c:18:1: note: stmt 1 _9 = MEM[(unsigned char *)s_25 + 6B];
perm.c:18:1: note: stmt 2 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 3 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 4 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 5 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 6 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 7 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: lane permutation { 0[7] 0[8] 0[9] 0[9] 0[9] 0[9] 0[9]
0[9] }
perm.c:18:1: note: children 0x580a7a0
$1 = void
(rr) p debug (child)
perm.c:18:1: note: node 0x580a7a0 (max_nunits=16, refcnt=4) vector(16) unsigned
char
perm.c:18:1: note: op template: _6 = MEM[(unsigned char *)s_25 + -2B];
perm.c:18:1: note: stmt 0 _6 = MEM[(unsigned char *)s_25 + -2B];
perm.c:18:1: note: stmt 1 ---
perm.c:18:1: note: stmt 2 ---
perm.c:18:1: note: stmt 3 ---
perm.c:18:1: note: stmt 4 ---
perm.c:18:1: note: stmt 5 ---
perm.c:18:1: note: stmt 6 _12 = MEM[(unsigned char *)s_25 + 4B];
perm.c:18:1: note: stmt 7 _8 = MEM[(unsigned char *)s_25 + 5B];
perm.c:18:1: note: stmt 8 _9 = MEM[(unsigned char *)s_25 + 6B];
perm.c:18:1: note: stmt 9 _13 = MEM[(unsigned char *)s_25 + 7B];
perm.c:18:1: note: stmt 10 _21 = MEM[(unsigned char *)s_25 + 8B];
perm.c:18:1: note: stmt 11 _29 = MEM[(unsigned char *)s_25 + 9B];
perm.c:18:1: note: stmt 12 ---
perm.c:18:1: note: stmt 13 ---
perm.c:18:1: note: stmt 14 ---
perm.c:18:1: note: stmt 15 ---
The code and the comments indicate to me that it was intended to support
repeated unpacks, but not repeated packs.
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index a5cd596fd28..7104835eb5a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10242,7 +10242,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo,
gimple_stmt_iterator *gsi,
if (children.length () == 1
&& known_eq (SLP_TREE_LANES (child) * nunits,
SLP_TREE_LANES (node) * op_nunits * 2))
- pack_p = true;
+ {
+ pack_p = true;
+ repeating_p = false;
+ }
/* Check whether the output has N times as many lanes per vector. */
else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
SLP_TREE_LANES (child) * nunits,
fixes it, since if we're packing, we're not repeating the original vector.
Testing the above change.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #9 from Jakub Jelinek ---
Even
static void
foo (unsigned char *d, unsigned char *s, int e, int f)
{
for (int i = 0; i < 4; i++)
{
d[0] = s[-2];
d[5] = (s[5] + s[6]) * 2 - (s[4] + s[7]);
d[6] = (s[6] + s[7]) * 2 - (s[5] + s[8]);
d[7] = (s[7] + s[8]) * 2 - (s[6] + s[9]);
d += e;
s += f;
}
}
unsigned char s[128] = { 2 }, d[128];
int
main ()
{
foo (d, s + 2, 16, 16);
if (d[5] != 0)
__builtin_abort ();
}
And the r15-4113 to r15-4114 difference in vect dump is then:
--- pr122793.c.180t.vect.r15-4113 2025-12-05 14:53:36.898953828 -0500
+++ pr122793.c.180t.vect.r15-4114 2025-12-05 14:53:43.572020773 -0500
@@ -193,12 +193,12 @@ int main ()
_118 = VEC_PERM_EXPR <_106, vect__6.12_48, { 0, 1, 2, 6 }>;
_119 = VEC_PERM_EXPR <_108, vect__6.16_20, { 1, 2, 6, 7 }>;
_120 = VEC_PERM_EXPR ;
- _124 = VEC_PERM_EXPR ;
- _125 = VEC_PERM_EXPR ;
- _126 = VEC_PERM_EXPR ;
- _97 = VEC_PERM_EXPR ;
- _98 = VEC_PERM_EXPR ;
- _99 = VEC_PERM_EXPR ;
+ _124 = VEC_PERM_EXPR ;
+ _125 = VEC_PERM_EXPR ;
+ _126 = VEC_PERM_EXPR ;
+ _97 = VEC_PERM_EXPR ;
+ _98 = VEC_PERM_EXPR ;
+ _99 = VEC_PERM_EXPR ;
_70 = VEC_PERM_EXPR ;
_71 = VEC_PERM_EXPR ;
_72 = VEC_PERM_EXPR ;
so clearly the same permutations and everything else, except the first two
arguments of some of the permutations are messed up.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
--- Comment #8 from Jakub Jelinek ---
Slightly further simplified, -O2 -msse4 fails, -O2 works:
static void
foo (unsigned char *d, unsigned char *s, int e, int f)
{
for (int i = 0; i < 4; i++)
{
d[0] = s[-2];
d[5] = (s[5] + s[6]) * 2 - (s[4] + s[7]) + s[3] + s[8];
d[6] = (s[6] + s[7]) * 2 - (s[5] + s[8]) + s[4] + s[9];
d[7] = (s[7] + s[8]) * 2 - (s[6] + s[9]) + s[5] + s[10];
d += e;
s += f;
}
}
unsigned char s[128] = { 2 }, d[128];
int
main ()
{
foo (d, s + 2, 16, 16);
if (d[5] != 0)
__builtin_abort ();
}
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Jakub Jelinek changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Seems it is the pack_p case, Commenting out /* Check whether the input has twice as many lanes per vector. */ else if (children.length () == 1 && known_eq (SLP_TREE_LANES (child) * nunits, SLP_TREE_LANES (node) * op_nunits * 2)) pack_p = true; makes the #c5 testcase pass, while commenting out /* Check whether the output has N times as many lanes per vector. */ else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits, SLP_TREE_LANES (child) * nunits, &this_unpack_factor) && (i == 0 || unpack_factor == this_unpack_factor)) unpack_factor = this_unpack_factor; instead doesn't fix it.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- But it clearly does. On #c5, no changes in ifcvt dump, in vect dump (r15-4113 to r15-4114): - vector(4) unsigned char _108; - vector(4) unsigned char _109; - vector(4) unsigned char _110; ... # vectp_src.5_93 = PHI ... vect__6.7_91 = MEM [(unsigned char *)vectp_src.5_93]; vectp_src.5_90 = vectp_src.5_93 + 4; vect__6.8_89 = MEM [(unsigned char *)vectp_src.5_90]; vectp_src.5_88 = vectp_src.5_93 + 8; vect__6.9_87 = MEM [(unsigned char *)vectp_src.5_88]; vectp_src.5_86 = vectp_src.5_93 + 12; vect__6.10_85 = MEM [(unsigned char *)vectp_src.5_86]; vectp_src.5_84 = vectp_src.5_93 + 16; vect__6.11_83 = MEM [(unsigned char *)vectp_src.5_84]; vectp_src.5_82 = vectp_src.5_93 + 20; vect__6.12_81 = MEM [(unsigned char *)vectp_src.5_82]; vectp_src.5_80 = vectp_src.5_93 + 24; vect__6.13_79 = MEM [(unsigned char *)vectp_src.5_80]; vectp_src.5_78 = vectp_src.5_93 + 28; vect__6.14_77 = MEM [(unsigned char *)vectp_src.5_78]; vectp_src.5_76 = vectp_src.5_93 + 32; vect__6.15_75 = MEM [(unsigned char *)vectp_src.5_76]; vectp_src.5_74 = vectp_src.5_93 + 36; vect__6.16_73 = MEM [(unsigned char *)vectp_src.5_74]; vectp_src.5_72 = vectp_src.5_93 + 40; vect__6.17_71 = MEM [(unsigned char *)vectp_src.5_72]; vectp_src.5_70 = vectp_src.5_93 + 44; vect__6.18_69 = MEM [(unsigned char *)vectp_src.5_70]; vectp_src.5_68 = vectp_src.5_93 + 48; vect__6.19_67 = MEM [(unsigned char *)vectp_src.5_68]; vectp_src.5_66 = vectp_src.5_93 + 52; vect__6.20_65 = MEM [(unsigned char *)vectp_src.5_66]; vectp_src.5_64 = vectp_src.5_93 + 56; vect__6.21_63 = MEM [(unsigned char *)vectp_src.5_64]; vectp_src.5_62 = vectp_src.5_93 + 60; ... - _156 = VEC_PERM_EXPR ; - _157 = VEC_PERM_EXPR ; - _158 = VEC_PERM_EXPR ; + _156 = VEC_PERM_EXPR ; + _157 = VEC_PERM_EXPR ; + _158 = VEC_PERM_EXPR ; ... - _132 = VEC_PERM_EXPR ; - _133 = VEC_PERM_EXPR ; - _134 = VEC_PERM_EXPR ; + _132 = VEC_PERM_EXPR ; + _133 = VEC_PERM_EXPR ; + _134 = VEC_PERM_EXPR ; ... - _108 = VEC_PERM_EXPR ; - _109 = VEC_PERM_EXPR ; - _110 = VEC_PERM_EXPR ; ... - vect__14.23_111 = _21 + _108; - vect__14.23_112 = _18 + _109; - vect__14.23_113 = _16 + _110; + vect__14.23_111 = _21 + _156; + vect__14.23_112 = _18 + _157; + vect__14.23_113 = _16 + _158;
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793
Kacper Michajłow changed:
What|Removed |Added
CC||kasper93 at gmail dot com
--- Comment #5 from Kacper Michajłow ---
Smaller reproducer, in case it would be useful.
```
static void foo(unsigned char *dst, unsigned char *src,
int dstStride, int srcStride) {
for (int i = 0; i < 4; i++) {
dst[0] = src[-2];
dst[5] = (src[5] + src[6]) * 2 - (src[4] + src[7]) * 5 + src[3] + src[8];
dst[6] = (src[6] + src[7]) * 2 - (src[5] + src[8]) * 5 + src[4] + src[9];
dst[7] = (src[7] + src[8]) * 2 - (src[6] + src[9]) * 5 + src[5] + src[0];
dst += dstStride;
src += srcStride;
}
}
unsigned char src[128] = {2};
unsigned char dst[128];
int main() {
foo(dst, src + 2, 16, 16);
if (dst[5] != 0)
__builtin_abort();
}
```
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #4 from Richard Biener --- (In reply to Sam James from comment #3) > r15-4114-g8157f3f2d211bf Of course this should not have affected x86_64 ...?
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2 since r15-4114-g8157f3f2d211bf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Sam James changed: What|Removed |Added Keywords|needs-bisection | Summary|[15/16 regression] ffmpeg |[15/16 regression] ffmpeg |miscompiled with -O2|miscompiled with -O2 |-march=x86-64-v2|-march=x86-64-v2 since ||r15-4114-g8157f3f2d211bf --- Comment #3 from Sam James --- r15-4114-g8157f3f2d211bf
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Richard Biener changed: What|Removed |Added Priority|P3 |P2 CC||rguenth at gcc dot gnu.org
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2025-11-21 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Confirmed. With this testcase. Note the cast `pixel *` is NOT going to cause a strict aliasing issue.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Andrew Pinski changed: What|Removed |Added Target Milestone|16.0|15.3
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 --- Comment #1 from Andrew Pinski --- So there might be 2 different issues with respect to ffmpeg. The one which is bisect now to a commit during GCC 16 is filed as PR 122797. This one looks different and looks older.
[Bug tree-optimization/122793] [15/16 regression] ffmpeg miscompiled with -O2 -march=x86-64-v2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122793 Sam James changed: What|Removed |Added Target Milestone|--- |16.0
