https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156
Bug ID: 109156 Summary: Support Absolute Difference detection in GCC Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* Today we support Sum of Absolute differences #include <stdlib.h> #define TYPE_IN signed char #define TYPE_OUT signed int TYPE_OUT SABD_example (int n, TYPE_IN *restrict a, TYPE_IN *restrict b) { TYPE_OUT out = 0; for (int i = 0; i < n; i++) out += abs(b[i] - a[i]); return out; } which is implemented through the SAD_EXPR tree code. The goal is to support absolute difference ABD and widening absolute difference (no reduction). #include <stdlib.h> #define TYPE_IN signed int #define TYPE_OUT signed int void ABD_example (int n, TYPE_IN *restrict a, TYPE_IN *restrict b, TYPE_OUT *restrict out) { for (int i = 0; i < n; i++) out[i] = abs(b[i] - a[i]); } This code shares 90% of the work with the vect_recog_sad_pattern expression with one difference, the SAD expression starts at a reduction, the ADB expressions start at an optional cast but otherwise from the abs. There are two ways we're thinking of implementing ABD, (ABDL we can't do at the moment because we can't do widening in IFNs). 1. refactor the SAD detection code such that the body of the code that detects ABD is refactored out and can be used by a new pattern. This has the down side of us having to do duplicate work in patterns that may match this. In general this doesn't happen often because SAD stops at the + and should be OK if ADB matches after SAD. though cast_forwardprop may make this hard to do. 2. It looks like all targets that implement SAD do so with an instruction that does ABD and then perform a reduction. So it looks like no target has the semantics for SAD. So this brings up the question of why the detection wasn't done based on ABD instead and leaving the reduction explicit in the vectorizer. So question is, should we create a completely new standalone pattern for ABD or should be make ABD the thing being detected and change SAD_EXPR to recognize ADB + reduction. Removing SAD completely in favor of ABD + reduction means that hand optimized versions in targets need updating so I'm in favor of still emitting SAD.