https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97043
Bug ID: 97043 Summary: latent wrong-code with SLP vectorization Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- There's a latent wrong-code bug on the branches visible with the gcc.dg/vect/pr81410.c testcase. The issue is avoided on trunk by means of delaying optimizing/validating of SLP permutations until the vectorization factor is final whilst on branches we throw away the permutation prematurely via if (!this_load_permuted /* The load requires permutation when unrolling exposes a gap either because the group is larger than the SLP group-size or because there is a gap between the groups. */ && (known_eq (unrolling_factor, 1U) || (group_size == DR_GROUP_SIZE (first_stmt_info) && DR_GROUP_GAP (first_stmt_info) == 0))) { SLP_TREE_LOAD_PERMUTATION (load_node).release (); because unrolling_factor is 1 but later is upped to 2 due to hybrid SLP/non-SLP vectorization. This causes us to run into the gap adjustment code added by the PR81410 fix: /* With SLP permutation we load the gaps as well, without we need to skip the gaps after we manage to fully load all elements. group_gap_adj is DR_GROUP_SIZE here. */ group_elt += nunits; if (maybe_ne (group_gap_adj, 0U) && !slp_perm && known_eq (group_elt, group_size - group_gap_adj)) { poly_wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap_adj); tree bump = wide_int_to_tree (sizetype, bump_val); dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt_info, bump); group_elt = 0; } which notes that we do not load the gaps. Now, alignment analysis correctly analyzes the load of x[] to be aligned due to the VF being 2: poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); step_preserves_misalignment_p = multiple_p (DR_STEP_ALIGNMENT (dr_info->dr) * vf, vect_align_c); but that assumes contiguous aligned loads. The VMAT_CONTIGUOUS-with-gap-without-permutation vectorization OTOH assumes that each individual instance of the group is aligned which it is not. Trying to fix up there would also require unaligned access support checking at analysis time as well as possible cost adjustments.