https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047
Bug ID: 87047 Summary: gcc 7 & 8 - performance regression because of if-conversion Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: already5chosen at yahoo dot com Target Milestone: --- Created attachment 44570 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44570&action=edit demonstrate performance regression because of if-conversion Very significant performance regression from gcc 6.x to 7.x and 8.x cause by if-conversion of predictable branch. Compilation flags: -O2 -Wall Target: x86-64 (my test machine is IvyBridge) It is possible that the problem is specific to x86-64 target. I tested (by observing compiler output) aarch64 target and it looks o.k. The problem occurs here: if ((i & 15)==0) { const uint64_t PROD_ONE = (uint64_t)(1) << 19; uint64_t prod = umulh(invRange, range); invRange = umulh(invRange, (PROD_ONE*2-1-prod)<<44)<<1; } The condition has low probability and is easily predicted by branch predictor, while code within if has relatively high latency. gcc, starting from gcc.7.x and up to the latest, is convinced that always executing the inner part of the if is a bright idea. Measurements, on my real-world code, do not agree and show 30% slowdown. I'm sure that on artificial sequences I can demonstrate a slowdown of 100% and more. What is special about this case is that compiler is VERY confident in its stupid decision. It does not change its mind even when I replace if ((i & 15)==0) { by if (__builtin_expect((i & 15)==0, 0)) { I found only two ways of forcing sane code generation: 1. -fno-if-conversion 2. if ((i & 15)==0) { asm volatile(""); ... }