at gcc dot gnu.org |ubizjak at gmail dot com
Last reconfirmed||2022-02-09
Ever confirmed|0 |1
--- Comment #1 from Uroš Bizjak ---
The testcase is quite creative with casts, creating:
(gdb) p debug_rtx ( operands[3])
(subreg:DI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104445
--- Comment #7 from Uroš Bizjak ---
(In reply to Richard Biener from comment #6)
> We are missing vec_extractv2sisi or vec_extractv8qiv4qi, with -mno-mmx -mavx.
> It seems we have addv2si3 available though.
vec_extractv2sisi is available in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104445
--- Comment #5 from Uroš Bizjak ---
We do have:
(define_expand "vec_extractv4qiqi"
[(match_operand:QI 0 "register_operand")
(match_operand:V4QI 1 "register_operand")
(match_operand 2 "const_int_operand")]
"TARGET_SSE4_1"
{
|--- |FIXED
Target Milestone|12.0|11.4
Host|x86_64-linux-gnu|i386-linux-gnu
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #7 from Uroš Bizjak ---
Fixed for gcc-11.4+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104362
--- Comment #4 from Uroš Bizjak ---
Or simply:
--cut here--
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad5a5caa413..dd5584fb8ed 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7400,7 +7400,8 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104362
--- Comment #3 from Uroš Bizjak ---
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad5a5caa413..a61a5390127 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7403,6 +7403,10 @@ find_drap_reg (void)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151
--- Comment #12 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #10)
> (In reply to Hongtao.liu from comment #4)
> > Also there's separate issue, codegen for below is not optimal
> > gimple:
> > _11 = VIEW_CONVERT_EXPR(a_3(D))
> >
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Created attachment 52318
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52318=edit
Prototype patch
I was looking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151
--- Comment #10 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #4)
> Also there's separate issue, codegen for below is not optimal
> gimple:
> _11 = VIEW_CONVERT_EXPR(a_3(D))
> asm:
> mov QWORD PTR [rsp-24], rdi
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104054
--- Comment #8 from Uroš Bizjak ---
Without debug instructions, the compiler is able to rename insns to:
65: di:DI=si:DI
66: dx:DI=r11:DI
74: cx:QI=0x1
REG_EQUAL 0x1
41: L41:
42: NOTE_INSN_BASIC_BLOCK 6
43:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104054
--- Comment #7 from Uroš Bizjak ---
For some reason the pass does not detect usage of Register si in (insn 55):
(debug_insn 55 54 56 6 (var_location:TI b (reg/v:TI 4 si [orig:86 b ] [86])) -1
(nil))
Register ax (1):
Register dx (1):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104054
--- Comment #5 from Uroš Bizjak ---
Could be a red herring, but in _.rnreg dump:
Register r9 (1): 75 [GENERAL_REGS] 18 [ALL_REGS] 97 [GENERAL_REGS]
Register r10 (1): 76 [GENERAL_REGS] 18 [ALL_REGS] 23 [GENERAL_REGS]
...
Register di (1): 55
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104054
Uroš Bizjak changed:
What|Removed |Added
Keywords|wrong-code |
--- Comment #4 from Uroš Bizjak ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104054
--- Comment #3 from Uroš Bizjak ---
The first difference is in rnreg pass, w/o -g:
28: L28:
29: NOTE_INSN_BASIC_BLOCK 4
30: [`i']=0
63: di:DI=r9:DI <--- here
64: dx:DI=r10:DI
9: r8:HI=0x5
REG_EQUAL 0x5
98:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001
--- Comment #4 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #2)
> I'm testing
>
> 1 file changed, 3 insertions(+), 3 deletions(-)
> gcc/config/i386/i386.md | 6 +++---
>
> modified gcc/config/i386/i386.md
> @@ -10455,7 +10455,7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003
--- Comment #2 from Uroš Bizjak ---
(define_insn "*xop_pcmov_"
- [(set (match_operand:VI_32 0 "register_operand" "=x")
-(if_then_else:VI_32
- (match_operand:VI_32 3 "register_operand" "x")
- (match_operand:VI_32 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003
Uroš Bizjak changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103935
Uroš Bizjak changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997
Uroš Bizjak changed:
What|Removed |Added
Target||x86
Keywords|
Priority: P3
Component: regression
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Recent patch introduced following testsuite FAILs:
FAIL: gcc.target/i386/pr88531-1b.c scan-assembler-times vgatherqpd 4
FAIL: gcc.target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670
Bug 88670 depends on bug 103948, which changed state.
Bug 103948 Summary: Vectorizer does not use vec_cmpMN without vcondMN pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 103948, which changed state.
Bug 103948 Summary: Vectorizer does not use vec_cmpMN without vcondMN pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
Uroš Bizjak changed:
What|Removed |Added
Target Milestone|--- |12.0
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #7 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #6)
> I'll try your proposed patch from Comment #5 later today and report here.
Yes, the patch works for me.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #6 from Uroš Bizjak ---
(In reply to Richard Biener from comment #5)
> I guess that tree-vect-generic.c is not up-to-date with gimple-isel.cc. We
> should probably somehow factor out relevant pieces.
>
> Note vector lowering will
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103935
--- Comment #3 from Uroš Bizjak ---
(In reply to Richard Biener from comment #2)
> no longer xfailed. I suggest to re-add the { xfail *-*-* } to the
> profitability check.
You mean xfail for non-x86 targets?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #4 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #3)
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 78e388d82f6..871366f3b7e 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -502,6 +502,9 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #3 from Uroš Bizjak ---
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 78e388d82f6..871366f3b7e 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -502,6 +502,9 @@ expand_vec_cond_expr_p (tree value_type, tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #2 from Uroš Bizjak ---
Created attachment 52146
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52146=edit
The complete testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
--- Comment #1 from Uroš Bizjak ---
Created attachment 52145
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52145=edit
Patch that illustrates the problem on x86 target
This patch should vectorize all integer relational operations with
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
I was trying to add v2qi vec_cmpv2qiv2qi pattern to x86:
(define_expand "vec_cmpv2qiv2qi"
[(set (match_oper
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase:
unsigned char ur[16], ua[16], ub[16];
void avgu_v2qi (void)
{
int i;
for (i = 0; i < 2; i++)
ur[i] = (ua[i] + ub[i] + 1) >> 1;
}
does not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103935
--- Comment #1 from Uroš Bizjak ---
As said in the patch submission:
I have changed scan-tree-dump patterns in g++.dg/vect/slp-pr98855.cc
to check that no SLP vectorization was performed. The existing
scan-tree-dump-times was too fragile,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103928
--- Comment #12 from Uroš Bizjak ---
(In reply to Manuel Lauss from comment #10)
> So it was either fixed in trunk in the last 20 hours, or pgo build broke
> gcc, or "-mno-xop" fixed it.
The fix for PR103905 was pushed to the master in the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103928
--- Comment #11 from Uroš Bizjak ---
(In reply to Martin Liška from comment #8)
> > No, bdver4 does not include XOP.
>
> Ohh, didn't know that...
Sorry, I was wrong:
{"bdver4", PROCESSOR_BDVER4, CPU_BDVER4,
PTA_64BIT | PTA_MMX |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103928
--- Comment #7 from Uroš Bizjak ---
(In reply to Martin Liška from comment #6)
> Then you may be affected by PR103905 which is fixed on the current master.
> Please pull to tip of master branch.
No, bdver4 does not include XOP.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94440
--- Comment #21 from Uroš Bizjak ---
Fixed?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103915
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92860
Bug 92860 depends on bug 103905, which changed state.
Bug 103905 Summary: [12 Regression] Miscompiled i386-expand.c with
-march=bdver1 and -O3 since r12-1789-g836328b2c99f5b8d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
What
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #2 from Uroš Bizjak ---
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index fc8ec5e4d49..96d85a54e10 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2752,7 +2752,7 @@
""
"#"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
Uroš Bizjak changed:
What|Removed |Added
Attachment #52120|0 |1
is obsolete|
||
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #8 from Uroš Bizjak ---
Created attachment 52127
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52127=edit
Proposed pa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #6 from Uroš Bizjak ---
@Jakub: It looks the problem is in expand_vec_perm_pshufb, where permutation
vector is recalculated for partial vectors:
if (vmode == V4QImode
|| vmode == V8QImode)
{
rtx m128 = GEN_INT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #4 from Uroš Bizjak ---
Created attachment 52123
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52123=edit
Patch that disables XOP permute with partial vectors
Please try this patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #3 from Uroš Bizjak ---
(In reply to Martin Liška from comment #1)
> Created attachment 52120 [details]
> Isolated test-case
>
> Isolated test-case where only the miscompiled function
> ix86_expand_vec_extract_even_odd uses -O3.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #2 from Uroš Bizjak ---
The referred patch adds:
+;; Pack/unpack vector modes
+(define_mode_attr mmxpackmode
+ [(V4HI "V8QI") (V2SI "V4HI")])
+
+(define_expand "vec_pack_trunc_"
+ [(match_operand: 0 "register_operand")
+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103900
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861
--- Comment #7 from Uroš Bizjak ---
(In reply to Richard Biener from comment #6)
> Not fully fixed I guess?
Not yet. I have a bunch of follow-up patches for various operations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103900
Uroš Bizjak changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103900
--- Comment #6 from Uroš Bizjak ---
(In reply to Martin Liška from comment #5)
> No, it still crashes with the current master (g:fbb592407c9):
Ah, the compiler is blindly trying to generate V2QI XOR due to missing
one_cmplv2qi2 pattern. I have
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103900
--- Comment #2 from Uroš Bizjak ---
Looks fixed, does not ICE for me with:
GNU C17 (GCC) version 12.0.0 20220104 (experimental) [master
r12-6200-g62c8b21d48a] (x86_64-pc-linux-gnu)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103894
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #2 from Uroš Bizjak ---
Created attachment 52111
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52111=edit
Proposed patch
Patch in testing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861
--- Comment #3 from Uroš Bizjak ---
The patched compiler compiles the testcase from Comment #0 on x86_64 with -O2
to:
plus:
movl%edi, %edx
movl%esi, %eax
addb%sil, %dl
addb%ah, %dh
movl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861
--- Comment #2 from Uroš Bizjak ---
Created attachment 52087
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52087=edit
Protorypw patch to vectorize with v2qi vectors
Patch that implmenents V2QI moves, logic and basic arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861
--- Comment #1 from Uroš Bizjak ---
Also:
char r[2], a[2], b[2];
void foo (void)
{
int i;
for (i = 0; i < 2; i++)
r[i] = a[i] + b[i];
}
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase:
typedef char __v2qi __attribute__ ((__vector_size__ (2)));
__v2qi plus (__v2qi a, __v2qi b) { return a + b; };
should be vectorized.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103842
--- Comment #6 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #5)
> Created attachment 52068 [details]
> gcc12-pr103842.patch
>
> Untested fix.
The patch is OK.
Thanks,
Uros.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #17 from Uroš Bizjak ---
(In reply to hubicka from comment #16)
> > >
> > > It could be done, but I was under impression that the sequence to load
> > > 1.0f
> > > into topmost elements nullifies the benefit of operation to divide
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #14 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #13)
> Created attachment 52051 [details]
> Patch that implements v2sf division
This patch also enables vectorization of the testcase from Comment #7. Using
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #13 from Uroš Bizjak ---
Created attachment 52051
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52051=edit
Patch that implements v2sf division
Please try the attached patch, for the following testcase:
--cut here--
float
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #12 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #10)
> At least on your short testcase clang doesn't use divps either.
> We do support mulv2sf3, addv2sf3 etc. but not divv2sf3 I bet because with
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103772
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103772
Uroš Bizjak changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #9 from Uroš Bizjak ---
(In reply to Thiago Macieira from comment #0)
> Testcase:
...
> The assembly for this produces:
>
> vmovdqu16 (%rdi), %ymm1
> vmovdqu16 32(%rdi), %ymm2
> vpcmpuw $0,
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
(Cloned from PR103571#18)
Following testcase:
--cut here--
typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
__v16hf foo (_Float16 x)
{
return (__v16hf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812
Uroš Bizjak changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #28 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #18)
> codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
> vec_setv16hf_0 and with vpblendw.
True, some opportunities are missing from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #27 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #25 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #22)
> Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached
> patch looks fine, the condition should be binded to real instructions but
> not mode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Uroš Bizjak changed:
What|Removed |Added
Attachment #51950|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Uroš Bizjak changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Uroš Bizjak changed:
What|Removed |Added
Attachment #51948|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #13 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #12)
> Hongtao, can you please review the patch and perhaps test it a bit more?
This part is missing from ix86_expand_vector_set_var:
--cut here
@@ -15912,7 +15921,8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #12 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #10)
> Sure.
Please find attached the complete patch that enables HF vector modes in Comment
#11. The patch survives bootstrap and regression test and works OK for the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
Uroš Bizjak changed:
What|Removed |Added
Attachment #51941|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #9 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> >
> > > There're several places in i386-expand.c which assume
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #7 from Uroš Bizjak ---
Created attachment 51941
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51941=edit
Proposed patch
The patch moves put V2HF+V4HF+V8HF/V16HF/V32HF TO
VALID_SSE2/AVX256/AVX512F_REG_MODE.
Also, introduces
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #6 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #5)
> There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571
--- Comment #4 from Uroš Bizjak ---
(In reply to Hongyu Wang from comment #3)
> So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> to modify those function_args and function_value stuff.
We have V8HFmode moves for
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase:
--cut here--
typedef _Float16 v2hf __attribute__((vector_size(4)));
typedef _Float16 v4hf __attribute__((vector_size(8)));
typedef _Float16 v8hf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102881
--- Comment #4 from Uroš Bizjak ---
> The master branch has been updated by Uros Bizjak :
Oops, wrong PR number...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #26 from Uroš Bizjak ---
The testcase now compiles with -O2 -mf16c to:
vpxor %xmm2, %xmm2, %xmm2
vpblendw$1, %xmm0, %xmm2, %xmm0
vpblendw$1, %xmm1, %xmm2, %xmm1
vcvtph2ps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #21 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Uroš Bizjak from comment #18)
> > (In reply to Uroš Bizjak from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > >
> > > >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #18 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao.liu from comment #16)
>
> > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > target as both input and output,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #17 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #16)
> ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> target as both input and output, it seems we can't create a new target for
> that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #15 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #14)
> (In reply to Uroš Bizjak from comment #13)
> > (In reply to Hongtao.liu from comment #12)
> > > >
> > > > Just noticed that for some reason two VPXORs are
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #13 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #12)
> >
> > Just noticed that for some reason two VPXORs are emitted. One should be
> > enough for both VPINSRW insns.
>
> With new alternative in your attached
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103439
--- Comment #3 from Uroš Bizjak ---
(In reply to rguent...@suse.de from comment #2)
> On Fri, 26 Nov 2021, ubizjak at gmail dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103439
> >
> > --- Com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #10 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #7)
> compiles with unpatched gcc -O2 -mf16c to:
>
> vmovss %xmm0, %xmm0, %xmm2 # 27[c=4 l=4] *movhf_internal/3
> pextrw $0, %xmm1, -4(%rsp)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103439
--- Comment #1 from Uroš Bizjak ---
(In reply to Richard Biener from comment #0)
> I'm not sure if there are valid cases where we have a mix of a direct
> RTL pattern and manual expansion, so where the { } part falls thru.
Yes, we have quite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #8 from Uroš Bizjak ---
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 68606e57e60..a2ebaa5ac63 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@
case TYPE_SSELOG:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #7 from Uroš Bizjak ---
The improvement with patch from comment #6:
The testcase:
_Float16 test (_Float16 a, _Float16 b)
{
return a + b;
}
compiles with unpatched gcc -O2 -mf16c to:
vmovss %xmm0, %xmm0, %xmm2 # 27
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #6 from Uroš Bizjak ---
Created attachment 51879
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51879=edit
Improve HI/HFmode scalar insert
The attached patch further improves HFmode -> SFmode conversion. HFmode values
are
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103406
--- Comment #2 from Uroš Bizjak ---
gcc/libgcc/config/i386/sfp-machine.h says:
/* Here is something Intel misdesigned: the specs don't define
the case where we have two NaNs with same mantissas, but
different sign. Different operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103074
--- Comment #4 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #3)
> Ah, actually what I see is that sched1 swaps the order of:
> (insn 22 21 23 4 (parallel [
> (set (reg:SI 88)
> (ashiftrt:SI (reg/v:SI
501 - 600 of 6636 matches
Mail list logo