; Zamyatin, Igor
Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
For the following testcase 1.c, on westmere and sandybridge, performance with
the option -mtune=^use_vector_fp_converts is better (improves from 3.46s to
2.83s). It means cvtss2sd is often better than
unpcklps
: Thursday, September 12, 2013 2:51 AM
To: GCC Patches
Cc: David Li; Zamyatin, Igor
Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
For the following testcase 1.c, on westmere and sandybridge, performance
with the option -mtune=^use_vector_fp_converts is better (improves from
Hi Wei Mi,
Have you checked in your patch?
--
H.J.
No, I havn't. Honza wants me to wait for his testing on AMD hardware.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
Hi Wei Mi,
Have you checked in your patch?
--
H.J.
No, I havn't. Honza wants me to wait for his testing on AMD hardware.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
I only wanted to separate it from the changes in generic so the regular testers
can pick it up
Ccing Uros. Changes in i386.md could be related to the fix for PR57954.
Thanks,
Igor
-Original Message-
From: Wei Mi [mailto:w...@google.com]
Sent: Thursday, September 12, 2013 2:51 AM
To: GCC Patches
Cc: David Li; Zamyatin, Igor
Subject: [PATCH] disable use_vector_fp_converts
Ping.
-Original Message-
From: Wei Mi [mailto:w...@google.com]
Sent: Thursday, September 12, 2013 2:51 AM
To: GCC Patches
Cc: David Li; Zamyatin, Igor
Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
For the following testcase 1.c, on westmere and sandybridge
For the following testcase 1.c, on westmere and sandybridge,
performance with the option -mtune=^use_vector_fp_converts is better
(improves from 3.46s to 2.83s). It means cvtss2sd is often better than
unpcklps+cvtps2pd on recent x86 platforms.
1.c:
float total = 0.2;
int k = 5;
int main() {
int
On Tue, Oct 1, 2013 at 3:50 PM, Jan Hubicka hubi...@ucw.cz wrote:
Hi Wei Mi,
Have you checked in your patch?
--
H.J.
No, I havn't. Honza wants me to wait for his testing on AMD hardware.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
I only wanted to separate it from the
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html
This patch seems resonable. (in fact I have pretty much same in my tree)
use_vector_fp_converts is actually trying to solve the same problem in AMD
hardware - you need to type the whole register when converting.
So it may work well
or SUB instructions,
because ADD and SUB overwrite all flags, whereas INC and DEC do not,
therefore
creating false dependencies on earlier instructions that set the flags.
Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.
I did not see
iff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c
> b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> new file mode 100644
> index 000..0a45f8e340a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/*
gcc.target/i386/pr101900-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake -mfpmath=sse
-mtune-ctrl=use_vector_fp_converts" } */
+
+extern float f;
+extern double d;
+extern int i;
+
+void
+foo (void)
+{
+ d = f;
+ f = i;
+}
+
+/* { dg-final { scan-assem
> -Original Message-
> From: Uros Bizjak
> Sent: Thursday, September 16, 2021 2:28 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
>
> Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONV
Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> > USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
> >
> > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > >
> > > From: "H.J. Lu"
> > >
> > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> >
/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -386,6 +386,10 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
from integer to FP. */
DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, use_vector_converts, m_AMDFAM10)
+/* X86_TUNE_SLOW_SHUFB: Indicates tunings with slow
rly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY
gcc/common/config/i386/i386-common.c | 2 +-
gcc/config/i386/i386-features.c | 23 +++-
gcc/config/i386/i386-options.c| 2 +-
gcc/config/i
.
Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.
I benchmarked the patch on SPEC2k and earlier it was benchmarked on 2k6
and the performance difference seems in noise. It causes about 0.3% code
size reduction. Main motivation for the patch is to drop some
not, therefore
creating false dependencies on earlier instructions that set the flags.
Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.
I did not see this in your patch, but Wei has this tuning in this patch:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html
instructions,
because ADD and SUB overwrite all flags, whereas INC and DEC do not,
therefore
creating false dependencies on earlier instructions that set the flags.
Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.
I did not see this in your patch, but Wei has
(X86_TUNE_USE_VECTOR_FP_CONVERTS, use_vector_fp_converts)
+DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, use_vector_converts)
+DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, fuse_cmp_and_branch)
+DEF_TUNE (X86_TUNE_OPT_AGU, opt_agu)
+DEF_TUNE (X86_TUNE_VECTORIZE_DOUBLE, vectorize_double)
+DEF_TUNE
On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > > >
> > > > From: "H.J. Lu"
> > > >
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > > TARGET_USE_VECTOR_CONVERTS when
> > > > handling avx_partial_xmm_update
On Sat, Sep 18, 2021 at 7:50 AM Jakub Jelinek via Gcc-patches
wrote:
>
> On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > > On Wed, Sep 15, 2021 at 10:10 AM wrote:
> > > > >
> > > > > From: "H.J. Lu"
> > > > >
> > > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
>
(X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
/* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
from integer to FP. */
DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, use_vector_converts, m_AMDFAM10)
-/* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction
/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4ae5f70..3d395b0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
/* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE
(X86_TUNE_USE_VECTOR_FP_CONVERTS, use_vector_fp_converts,
m_CORE_ALL | m_AMDFAM10 | m_GENERIC)
+
/* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
from integer to FP. */
DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, use_vector_converts, m_AMDFAM10)
+
/* X86_TUNE_FUSE_CMP_AND_BRANCH
-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
/* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
from integer to FP. */
DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, use_vector_converts
-tune.def
index 4ae5f70..3d395b0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
/* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
from integer to FP
]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4ae5f70..3d395b0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -193,10 +193,24 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
use_vector_fp_converts,
/* X86_TUNE_USE_VECTOR_CONVERTS
(X86_TUNE_SLOW_IMUL_IMM8, slow_imul_imm8)
-DEF_TUNE (X86_TUNE_MOVE_M1_VIA_OR, move_m1_via_or)
-DEF_TUNE (X86_TUNE_NOT_UNPAIRABLE, not_unpairable)
-DEF_TUNE (X86_TUNE_NOT_VECTORMODE, not_vectormode)
-DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS, use_vector_fp_converts)
-DEF_TUNE
29 matches
Mail list logo