https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #5 from Mason ---
FWIW, trunk (gcc14) translates testcase3 to the same code as the other
testcases, while remaining portable across all architectures:
$ gcc-trunk -O3 -march=bdver3 testcase3.c
typedef unsigned long long u64;
typede
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #4 from Mason ---
I confirm that trunk now emits the same code for testcase1 and testcase2.
Thanks Jakub and Roger, great work!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #2 from Mason ---
You meant PR79173 ;)
Latest update:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621554.html
I didn't see my testcase specifically in Jakub's patch,
but I'll test trunk on godbolt when/if the patch lands.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #20 from Mason ---
Doh! You're right.
I come from a background where overlapping/aliasing inputs are heresy,
thus got blindsided :(
This would be the optimal code, right?
add4i:
# rdi = dst, rsi = a, rdx = b
movq 0(%rdx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #16 from Mason ---
For the record, the example I provided was intended to show that, with some
help, GCC can generate good code for bigint multiplication. In this situation,
"help" means a short asm template.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #12 from Mason ---
Actually, in this case, we don't need to propagate the carry over 3 limbs.
typedef unsigned int u32;
typedef unsigned long long u64;
/* u32 acc[2], a[1], b[1] */
static void mul_add_32x32(u32 *acc, const u32 *a,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #11 from Mason ---
Here's umul_least_64() rewritten as mul_64x64_128() in C
typedef unsigned int u32;
typedef unsigned long long u64;
/* u32 acc[3], a[1], b[1] */
static void mul_add_32x32(u32 *acc, const u32 *a, const u32 *b)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
Bug ID: 110104
Summary: gcc produces sub-optimal code for _addcarry_u64 chain
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #18 from Mason ---
Hello Michael_S,
As far as I can see, massaging the source helps GCC generate optimal code
(in terms of instruction count, not convinced about scheduling).
#include
typedef unsigned long long u64;
void add4i(u64