https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #8 from vekumar at gcc dot gnu.org ---
I tested mdbx before and after the revision Richard pointed out.
On My Ryzen box there is ~4% regression.
Although "vblenvps" is fast path instruction and can execute in pipe 0/1. It
competes w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #7 from rguenther at suse dot de ---
On Fri, 1 Feb 2019, peter at cordes dot ca wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
>
> --- Comment #5 from Peter Cordes ---
>IF ( xij.GT.+HALf ) xij = xij - P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #6 from Peter Cordes ---
Oops, these were SD not SS. Getting sleepy >.<. Still, my optimization
suggestion for doing both compares in one masked SUB of +-PBCx applies equally.
And I think my testing with VBLENDVPS should apply equa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #5 from Peter Cordes ---
IF ( xij.GT.+HALf ) xij = xij - PBCx
IF ( xij.LT.-HALf ) xij = xij + PBCx
For code like this, *if we can prove only one of the IF() conditions will be
true*, we can implement it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #4 from Peter Cordes ---
I suspect dep-chains are the problem, and branching to skip work is a Good
Thing when it's predictable.
(In reply to Richard Biener from comment #2)
> On Skylake it's better (1uop, 1 cycle latency) while on R
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
Jakub Jelinek changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
--- Comment #
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
Richard Biener changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization
Target|