[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/banach-space closed https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
banach-space wrote: Closing in favour of https://github.com/llvm/llvm-project/pull/195602 and other patches that will follow. Thank you for splitting this @yairbenavraham 🙏🏻 https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
yairbenavraham wrote: > Thanks! I have a high-level ask - could you split this PR into multiple PRs? > > Basically, this is implementing quite a few variants in one PR: > > * `BI__builtin_neon_vfmaq_v` > > * `BI__builtin_neon_vfmaq_lane_v` > > * `BI__builtin_neon_vfma_laneq_v` > > * `BI__builtin_neon_vfmaq_laneq_v` > > * `BI__builtin_neon_vfmad_laneq_f64` > > > The implementations are different, some tests are missing (e.g. for > `BI__builtin_neon_vfmaq_v`) and the output generated _with_ and _without_ > `-fclangir` do not match. This will require taking a closer look, hence kind > request to split this. Could you start with `BI__builtin_neon_vfmaq_v`? > > Thanks for working on this! I am leaving more comments inline. https://github.com/llvm/llvm-project/pull/195602 https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/banach-space commented: Thanks! I have a high-level ask - could you split this PR into multiple PRs? Basically, this is implementing quite a few variants in one PR: * `BI__builtin_neon_vfmaq_v` * `BI__builtin_neon_vfmaq_lane_v` * `BI__builtin_neon_vfma_laneq_v` * `BI__builtin_neon_vfmaq_laneq_v` * `BI__builtin_neon_vfmad_laneq_f64` The implementations are different, some tests are missing (e.g. for `BI__builtin_neon_vfmaq_v`) and the output generated _with_ and _without_ `-fclangir` do not match. This will require taking a closer look, hence kind request to split this. Could you start with `BI__builtin_neon_vfmaq_v`? Thanks for working on this! I am leaving more comments inline. https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -912,6 +599,157 @@ static cir::VectorType
getSVEVectorForElementType(CIRGenModule &cgm,
return cir::VectorType::get(eltTy, numElts, /*is_scalable=*/true);
}
+//===--===//
+// NEON helpers
+//===--===//
+/// Return true if BuiltinID is an overloaded Neon intrinsic with an extra
+/// argument that specifies the vector type. The additional argument is meant
+/// for Sema checking (see `CheckNeonBuiltinFunctionCall`) and this function
+/// should be kept consistent with the logic in Sema.
+/// TODO: Make this return false for SISD builtins.
+/// TODO(cir): Share this with ARM.cpp
+static bool hasExtraNeonArgument(unsigned builtinID) {
+ // Required by the headers included below, but not in this particular
+ // function.
+ [[maybe_unused]] int PtrArgNum = -1;
+ [[maybe_unused]] bool HasConstPtr = false;
+
+ // The mask encodes the type. We don't care about the actual value. Instead,
+ // we just check whether its been set.
+ uint64_t mask = 0;
+ switch (builtinID) {
+#define GET_NEON_OVERLOAD_CHECK
+#include "clang/Basic/arm_fp16.inc"
+#include "clang/Basic/arm_neon.inc"
+#undef GET_NEON_OVERLOAD_CHECK
+ // Non-neon builtins for controling VFP that take extra argument for
+ // discriminating the type.
+ case ARM::BI__builtin_arm_vcvtr_f:
+ case ARM::BI__builtin_arm_vcvtr_d:
+mask = 1;
+ }
+ switch (builtinID) {
+ default:
+break;
+ }
+
+ return mask != 0;
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+template
+static mlir::Value
+emitNeonCallToOp(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+ llvm::SmallVector argTypes,
+ llvm::SmallVectorImpl &args,
+ std::optional intrinsicName,
+ mlir::Type funcResTy, mlir::Location loc,
+ bool isConstrainedFPIntrinsic = false, unsigned shift = 0,
+ bool rightshift = false) {
+ // TODO(cir): Consider removing the following unreachable when we have
+ // emitConstrainedFPCall feature implemented
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+ if (isConstrainedFPIntrinsic)
+cgm.errorNYI(loc, std::string("constrained FP intrinsic"));
+
+ for (unsigned j = 0; j < argTypes.size(); ++j) {
+if (isConstrainedFPIntrinsic) {
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+}
+if (shift > 0 && shift == j) {
+ cgm.errorNYI(loc, std::string("intrinsic requiring a shift Op"));
+} else {
+ args[j] = builder.createBitcast(args[j], argTypes[j]);
+}
+ }
+ if (isConstrainedFPIntrinsic) {
+assert(!cir::MissingFeatures::emitConstrainedFPCall());
+return nullptr;
+ }
+ if constexpr (std::is_same_v) {
+return Operation::create(builder, loc,
+ builder.getStringAttr(intrinsicName.value()),
+ funcResTy, args)
+.getResult();
+ } else {
+return Operation::create(builder, loc, funcResTy, args).getResult();
+ }
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+static mlir::Value emitNeonCall(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+llvm::SmallVector argTypes,
+llvm::SmallVectorImpl &args,
+llvm::StringRef intrinsicName,
+mlir::Type funcResTy, mlir::Location loc,
+bool isConstrainedFPIntrinsic = false,
+unsigned shift = 0, bool rightshift = false) {
+ return emitNeonCallToOp(
+ cgm, builder, std::move(argTypes), args, intrinsicName, funcResTy, loc,
+ isConstrainedFPIntrinsic, shift, rightshift);
+}
+
+static mlir::Value emitCommonNeonSISDBuiltinExpr(
+CIRGenFunction &cgf, const ARMVectorIntrinsicInfo &info,
+llvm::SmallVectorImpl &ops, const CallExpr *expr) {
+ assert(info.LLVMIntrinsic && "Generic code assumes a valid intrinsic");
+
+ switch (info.BuiltinID) {
+ case NEON::BI__builtin_neon_vcled_s64:
+ case NEON::BI__builtin_neon_vcled_u64:
+ case NEON::BI__builtin_neon_vcles_f32:
+ case NEON::BI__builtin_neon_vcled_f64:
+ case NEON::BI__builtin_neon_vcltd_s64:
+ case NEON::BI__builtin_neon_vcltd_u64:
+ case NEON::BI__builtin_neon_vclts_f32:
+ case NEON::BI__builtin_neon_vcltd_f64:
+ case NEON::BI__builtin_neon_vcales_f32:
+ case NEON::BI__builtin_neon_vcaled_f64:
+ case NEON::BI__builtin_neon_vcalts_f32:
+ case NEON::BI__builtin_neon_vcaltd_f64:
+cgf.cgm.errorNYI(expr->getSourceRange(),
+ std::string("unimplemented AArch64 builtin call: ") +
+ cgf.getContext().BuiltinInfo.getName(info.BuiltinID));
+break;
+ }
+
+ llvm::StringRef llvmIntrName = getLLVMIntrNameNoPrefix(
+ static_cast(info.LLVMIntrinsic));
+ mlir::L
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/banach-space edited https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -912,6 +599,157 @@ static cir::VectorType
getSVEVectorForElementType(CIRGenModule &cgm,
return cir::VectorType::get(eltTy, numElts, /*is_scalable=*/true);
}
+//===--===//
+// NEON helpers
+//===--===//
+/// Return true if BuiltinID is an overloaded Neon intrinsic with an extra
+/// argument that specifies the vector type. The additional argument is meant
+/// for Sema checking (see `CheckNeonBuiltinFunctionCall`) and this function
+/// should be kept consistent with the logic in Sema.
+/// TODO: Make this return false for SISD builtins.
+/// TODO(cir): Share this with ARM.cpp
+static bool hasExtraNeonArgument(unsigned builtinID) {
+ // Required by the headers included below, but not in this particular
+ // function.
+ [[maybe_unused]] int PtrArgNum = -1;
+ [[maybe_unused]] bool HasConstPtr = false;
+
+ // The mask encodes the type. We don't care about the actual value. Instead,
+ // we just check whether its been set.
+ uint64_t mask = 0;
+ switch (builtinID) {
+#define GET_NEON_OVERLOAD_CHECK
+#include "clang/Basic/arm_fp16.inc"
+#include "clang/Basic/arm_neon.inc"
+#undef GET_NEON_OVERLOAD_CHECK
+ // Non-neon builtins for controling VFP that take extra argument for
+ // discriminating the type.
+ case ARM::BI__builtin_arm_vcvtr_f:
+ case ARM::BI__builtin_arm_vcvtr_d:
+mask = 1;
+ }
+ switch (builtinID) {
+ default:
+break;
+ }
+
+ return mask != 0;
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+template
+static mlir::Value
+emitNeonCallToOp(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+ llvm::SmallVector argTypes,
+ llvm::SmallVectorImpl &args,
+ std::optional intrinsicName,
+ mlir::Type funcResTy, mlir::Location loc,
+ bool isConstrainedFPIntrinsic = false, unsigned shift = 0,
+ bool rightshift = false) {
+ // TODO(cir): Consider removing the following unreachable when we have
+ // emitConstrainedFPCall feature implemented
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+ if (isConstrainedFPIntrinsic)
+cgm.errorNYI(loc, std::string("constrained FP intrinsic"));
+
+ for (unsigned j = 0; j < argTypes.size(); ++j) {
+if (isConstrainedFPIntrinsic) {
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+}
+if (shift > 0 && shift == j) {
+ cgm.errorNYI(loc, std::string("intrinsic requiring a shift Op"));
+} else {
+ args[j] = builder.createBitcast(args[j], argTypes[j]);
+}
+ }
+ if (isConstrainedFPIntrinsic) {
+assert(!cir::MissingFeatures::emitConstrainedFPCall());
+return nullptr;
+ }
+ if constexpr (std::is_same_v) {
+return Operation::create(builder, loc,
+ builder.getStringAttr(intrinsicName.value()),
+ funcResTy, args)
+.getResult();
+ } else {
+return Operation::create(builder, loc, funcResTy, args).getResult();
+ }
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+static mlir::Value emitNeonCall(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+llvm::SmallVector argTypes,
+llvm::SmallVectorImpl &args,
+llvm::StringRef intrinsicName,
+mlir::Type funcResTy, mlir::Location loc,
+bool isConstrainedFPIntrinsic = false,
+unsigned shift = 0, bool rightshift = false) {
+ return emitNeonCallToOp(
+ cgm, builder, std::move(argTypes), args, intrinsicName, funcResTy, loc,
+ isConstrainedFPIntrinsic, shift, rightshift);
+}
+
+static mlir::Value emitCommonNeonSISDBuiltinExpr(
+CIRGenFunction &cgf, const ARMVectorIntrinsicInfo &info,
+llvm::SmallVectorImpl &ops, const CallExpr *expr) {
+ assert(info.LLVMIntrinsic && "Generic code assumes a valid intrinsic");
+
+ switch (info.BuiltinID) {
+ case NEON::BI__builtin_neon_vcled_s64:
+ case NEON::BI__builtin_neon_vcled_u64:
+ case NEON::BI__builtin_neon_vcles_f32:
+ case NEON::BI__builtin_neon_vcled_f64:
+ case NEON::BI__builtin_neon_vcltd_s64:
+ case NEON::BI__builtin_neon_vcltd_u64:
+ case NEON::BI__builtin_neon_vclts_f32:
+ case NEON::BI__builtin_neon_vcltd_f64:
+ case NEON::BI__builtin_neon_vcales_f32:
+ case NEON::BI__builtin_neon_vcaled_f64:
+ case NEON::BI__builtin_neon_vcalts_f32:
+ case NEON::BI__builtin_neon_vcaltd_f64:
+cgf.cgm.errorNYI(expr->getSourceRange(),
+ std::string("unimplemented AArch64 builtin call: ") +
+ cgf.getContext().BuiltinInfo.getName(info.BuiltinID));
+break;
+ }
+
+ llvm::StringRef llvmIntrName = getLLVMIntrNameNoPrefix(
+ static_cast(info.LLVMIntrinsic));
+ mlir::L
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -912,6 +599,157 @@ static cir::VectorType
getSVEVectorForElementType(CIRGenModule &cgm,
return cir::VectorType::get(eltTy, numElts, /*is_scalable=*/true);
}
+//===--===//
+// NEON helpers
+//===--===//
+/// Return true if BuiltinID is an overloaded Neon intrinsic with an extra
+/// argument that specifies the vector type. The additional argument is meant
+/// for Sema checking (see `CheckNeonBuiltinFunctionCall`) and this function
+/// should be kept consistent with the logic in Sema.
+/// TODO: Make this return false for SISD builtins.
+/// TODO(cir): Share this with ARM.cpp
+static bool hasExtraNeonArgument(unsigned builtinID) {
+ // Required by the headers included below, but not in this particular
+ // function.
+ [[maybe_unused]] int PtrArgNum = -1;
+ [[maybe_unused]] bool HasConstPtr = false;
+
+ // The mask encodes the type. We don't care about the actual value. Instead,
+ // we just check whether its been set.
+ uint64_t mask = 0;
+ switch (builtinID) {
+#define GET_NEON_OVERLOAD_CHECK
+#include "clang/Basic/arm_fp16.inc"
+#include "clang/Basic/arm_neon.inc"
+#undef GET_NEON_OVERLOAD_CHECK
+ // Non-neon builtins for controling VFP that take extra argument for
+ // discriminating the type.
+ case ARM::BI__builtin_arm_vcvtr_f:
+ case ARM::BI__builtin_arm_vcvtr_d:
+mask = 1;
+ }
+ switch (builtinID) {
+ default:
+break;
+ }
+
+ return mask != 0;
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+template
+static mlir::Value
+emitNeonCallToOp(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+ llvm::SmallVector argTypes,
+ llvm::SmallVectorImpl &args,
+ std::optional intrinsicName,
+ mlir::Type funcResTy, mlir::Location loc,
+ bool isConstrainedFPIntrinsic = false, unsigned shift = 0,
+ bool rightshift = false) {
+ // TODO(cir): Consider removing the following unreachable when we have
+ // emitConstrainedFPCall feature implemented
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+ if (isConstrainedFPIntrinsic)
+cgm.errorNYI(loc, std::string("constrained FP intrinsic"));
+
+ for (unsigned j = 0; j < argTypes.size(); ++j) {
+if (isConstrainedFPIntrinsic) {
+ assert(!cir::MissingFeatures::emitConstrainedFPCall());
+}
+if (shift > 0 && shift == j) {
+ cgm.errorNYI(loc, std::string("intrinsic requiring a shift Op"));
+} else {
+ args[j] = builder.createBitcast(args[j], argTypes[j]);
+}
+ }
+ if (isConstrainedFPIntrinsic) {
+assert(!cir::MissingFeatures::emitConstrainedFPCall());
+return nullptr;
+ }
+ if constexpr (std::is_same_v) {
+return Operation::create(builder, loc,
+ builder.getStringAttr(intrinsicName.value()),
+ funcResTy, args)
+.getResult();
+ } else {
+return Operation::create(builder, loc, funcResTy, args).getResult();
+ }
+}
+
+// TODO(cir): Remove `cgm` from the list of arguments once all NYI(s) are gone.
+static mlir::Value emitNeonCall(CIRGenModule &cgm, CIRGenBuilderTy &builder,
+llvm::SmallVector argTypes,
+llvm::SmallVectorImpl &args,
+llvm::StringRef intrinsicName,
+mlir::Type funcResTy, mlir::Location loc,
+bool isConstrainedFPIntrinsic = false,
+unsigned shift = 0, bool rightshift = false) {
+ return emitNeonCallToOp(
+ cgm, builder, std::move(argTypes), args, intrinsicName, funcResTy, loc,
+ isConstrainedFPIntrinsic, shift, rightshift);
+}
+
+static mlir::Value emitCommonNeonSISDBuiltinExpr(
+CIRGenFunction &cgf, const ARMVectorIntrinsicInfo &info,
+llvm::SmallVectorImpl &ops, const CallExpr *expr) {
+ assert(info.LLVMIntrinsic && "Generic code assumes a valid intrinsic");
+
+ switch (info.BuiltinID) {
+ case NEON::BI__builtin_neon_vcled_s64:
+ case NEON::BI__builtin_neon_vcled_u64:
+ case NEON::BI__builtin_neon_vcles_f32:
+ case NEON::BI__builtin_neon_vcled_f64:
+ case NEON::BI__builtin_neon_vcltd_s64:
+ case NEON::BI__builtin_neon_vcltd_u64:
+ case NEON::BI__builtin_neon_vclts_f32:
+ case NEON::BI__builtin_neon_vcltd_f64:
+ case NEON::BI__builtin_neon_vcales_f32:
+ case NEON::BI__builtin_neon_vcaled_f64:
+ case NEON::BI__builtin_neon_vcalts_f32:
+ case NEON::BI__builtin_neon_vcaltd_f64:
+cgf.cgm.errorNYI(expr->getSourceRange(),
+ std::string("unimplemented AArch64 builtin call: ") +
+ cgf.getContext().BuiltinInfo.getName(info.BuiltinID));
+break;
+ }
+
+ llvm::StringRef llvmIntrName = getLLVMIntrNameNoPrefix(
+ static_cast(info.LLVMIntrinsic));
+ mlir::L
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
yairbenavraham wrote: > @yairbenavraham Please note, the intrinsic group that is allocated to you is > this: > > * > https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#fused-multiply-accumulate > > > There is neither `f16` nor `Poly128` in that group and hence no need to > concern yourself with those for now. > > Please, could you refactor this so that only > https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#fused-multiply-accumulate-2 > are covered? Thanks! @banach-space Does the [latest commit](https://github.com/llvm/llvm-project/pull/188190/changes/0e70dd2769773af3f04d1492c7246749f338fde7#diff-d019af31ff74359b640cf418ee0d62abb1761bf3f63089697d8d624fcc16e999) refactors the `f16` and `Poly128`? https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
banach-space wrote: > @banach-space I’m wondering, not sure, about the scalar `f16` CIR coverage > here. > The moved scalar file covers the LLVM path, but enabling the full `-fclangir` > run for all scalar forms still hits an unrelated `Poly128` CIR NYI before the > file completes. I therefore kept CIR/CIRLLVM coverage for the supported > scalar `f32`/`f64` forms, but not for scalar `f16`. > Shall I keep that split in this PR, or should I try to address/work around > the unrelated `Poly128` CIR failure as follow-up so the scalar `f16` forms > can also be verified through CIR? @yairbenavraham Please note, the intrinsic group that is allocated to you is this: * https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#fused-multiply-accumulate There is neither `f16` nor `Poly128` in that group and hence no need to concern yourself with those for now. Please, could you refactor this so that only https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#fused-multiply-accumulate-2 are covered? Thanks! https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
yairbenavraham wrote: @banach-space I’m wondering, not sure, about the scalar `f16` CIR coverage here. The moved scalar file covers the LLVM path, but enabling the full `-fclangir` run for all scalar forms still hits an unrelated `Poly128` CIR NYI before the file completes. I therefore kept CIR/CIRLLVM coverage for the supported scalar `f32`/`f64` forms, but not for scalar `f16`. Shall I keep that split in this PR, or should I try to address/work around the unrelated `Poly128` CIR failure as follow-up so the scalar `f16` forms can also be verified through CIR? https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -554,6 +556,77 @@ emitCallMaybeConstrainedBuiltin(CIRGenBuilderTy &builder,
mlir::Location loc,
return builder.emitIntrinsicCallOp(loc, intrName, retTy, ops);
}
+static mlir::Value emitVectorFmaLaneSource(CIRGenBuilderTy &builder,
+ mlir::Location loc,
+ const CallExpr *expr,
+ ASTContext &ctx,
+ mlir::Value laneSource,
+ cir::VectorType ty,
+ cir::VectorType sourceTy) {
+ if (laneSource.getType() != sourceTy)
+laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+ auto vecTy = mlir::cast(ty);
+ int64_t lane = expr->getArg(3)->EvaluateKnownConstInt(ctx).getSExtValue();
+ llvm::SmallVector mask(vecTy.getSize(), lane);
+ return builder.createVecShuffle(loc, laneSource, mask);
+}
+
+static mlir::Value emitVectorFmaBuiltin(CIRGenFunction &cgf,
+mlir::Location loc,
+llvm::SmallVectorImpl
&ops,
+const CallExpr *expr) {
+ cir::VectorType ty =
mlir::cast(cgf.convertType(expr->getType()));
+ if (ops[0].getType() != ty)
+ops[0] = cgf.getBuilder().createBitcast(loc, ops[0], ty);
+ if (ops[1].getType() != ty)
+ops[1] = cgf.getBuilder().createBitcast(loc, ops[1], ty);
+ if (ops[2].getType() != ty)
+ops[2] = cgf.getBuilder().createBitcast(loc, ops[2], ty);
+ std::rotate(ops.begin(), ops.begin() + 1, ops.end());
+ return emitCallMaybeConstrainedBuiltin(cgf.getBuilder(), loc, "fma", ty,
ops);
+}
+
+static mlir::Value emitVectorFmaLaneBuiltin(CIRGenFunction &cgf,
+unsigned builtinID,
+NeonTypeFlags type,
+mlir::Location loc,
+const CallExpr *expr,
+llvm::SmallVectorImpl
&ops) {
+ cir::VectorType ty = getNeonType(&cgf, type, loc);
+ if (!ty)
+return nullptr;
+
+ auto vecTy = mlir::cast(ty);
+ cir::VectorType sourceTy = ty;
+ unsigned vectorFmaBuiltin = NEON::BI__builtin_neon_vfma_v;
+
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmaq_lane_v:
+sourceTy = cir::VectorType::get(vecTy.getElementType(), vecTy.getSize() /
2);
+vectorFmaBuiltin = NEON::BI__builtin_neon_vfmaq_v;
+break;
+ case NEON::BI__builtin_neon_vfma_laneq_v:
+sourceTy = cir::VectorType::get(vecTy.getElementType(), vecTy.getSize() *
2);
+break;
+ case NEON::BI__builtin_neon_vfmaq_laneq_v:
+vectorFmaBuiltin = NEON::BI__builtin_neon_vfmaq_v;
+break;
+ case NEON::BI__builtin_neon_vfma_lane_v:
+break;
+ default:
+llvm_unreachable("unexpected vfma lane builtin");
+ }
+
+ llvm::SmallVector fmaOps(ops.begin(), ops.end() - 1);
+ fmaOps[2] = emitVectorFmaLaneSource(cgf.getBuilder(), loc, expr,
+ cgf.getContext(), ops[2], ty, sourceTy);
+ const ARMVectorIntrinsicInfo *info = findARMVectorIntrinsicInMap(
+ AArch64SIMDIntrinsicMap, vectorFmaBuiltin,
+ aarch64SIMDIntrinsicsProvenSorted);
+ return emitCommonNeonBuiltinExpr(cgf, *info, fmaOps, expr);
+}
banach-space wrote:
We didn't need these extra helpers in neither the incubator nor for the
original code-gen.
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -2313,11 +2386,15 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
getContext().BuiltinInfo.getName(builtinID));
return mlir::Value{};
case NEON::BI__builtin_neon_vabd_v:
- case NEON::BI__builtin_neon_vabdq_v:
+ case NEON::BI__builtin_neon_vabdq_v: {
+cir::VectorType ty = getNeonType(this, type, loc);
+if (!ty)
+ return nullptr;
intrName = usgn ? "aarch64.neon.uabd" : "aarch64.neon.sabd";
if (cir::isFPOrVectorOfFPType(ty))
intrName = "aarch64.neon.fabd";
return emitNeonCall(cgm, builder, {ty, ty}, ops, intrName, ty, loc);
+ }
banach-space wrote:
Unrelated changes, please revert.
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -420,6 +420,8 @@ static mlir::Value emitCommonNeonBuiltinExpr( case NEON::BI__builtin_neon_vld4q_lane_v: case NEON::BI__builtin_neon_vmovl_v: case NEON::BI__builtin_neon_vmovn_v: + case NEON::BI__builtin_neon_vbsl_v: + case NEON::BI__builtin_neon_vbslq_v: banach-space wrote: Unrelated change, please revert. https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -807,6 +880,9 @@ static mlir::Value emitCommonNeonSISDBuiltinExpr(
return emitNeonCall(cgf.cgm, cgf.getBuilder(),
{cgf.convertType(expr->getArg(0)->getType())}, ops,
llvmIntrName, cgf.convertType(expr->getType()), loc);
+ case NEON::BI__builtin_neon_vfma_v:
+ case NEON::BI__builtin_neon_vfmaq_v:
+return emitVectorFmaBuiltin(cgf, loc, ops, expr);
banach-space wrote:
As is the case with [the original implementation from
ARM.cpp](https://github.com/llvm/llvm-project/blob/8d7823ea8f40cf5df1c623018bf9c0a308fa4a36/clang/lib/CodeGen/TargetBuiltins/ARM.cpp?plain=1#L1118-L1175),
I would expect all intrinsics implemented in this method to lower directly via
`emitNeonCall`. Why diverge?
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/5] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/5] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -2230,12 +2223,53 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+// Keep the NEON vector type local to each vector-only builtin block.
+cir::VectorType ty = getNeonType(this, type, loc);
+if (!ty)
+ return nullptr;
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
banach-space wrote:
This block is very involved compared to
http://github.com/llvm/llvm-project/blob/8d7823ea8f40cf5df1c623018bf9c0a308fa4a36/clang/lib/CodeGen/TargetBuiltins/ARM.cpp?plain=1#L6153-L6162
and
https://github.com/llvm/clangir/blob/main/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp#L4246-L4264
Why not follow the pre-existing logic that is much shorter?
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -2216,10 +2213,6 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
- cir::VectorType ty = getNeonType(this, type, loc);
- if (!ty)
-return nullptr;
-
banach-space wrote:
Unrelated change?
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/banach-space commented: Thanks for the updates, I am sending some fresh comments. https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/banach-space edited https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -138,11 +138,8 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
v1Ty ? 1 : (4 << isQuad));
return cir::VectorType::get(cgf->uInt16Ty, v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
banach-space wrote:
We still need to make this logic depend on `hasLegalHalfType`, see
https://github.com/llvm/llvm-project/blob/8d7823ea8f40cf5df1c623018bf9c0a308fa4a36/clang/lib/CodeGen/TargetBuiltins/ARM.cpp?plain=1#L380-L382
for reference.
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
banach-space wrote:
> I understand that since CIR now has the required f16 type support for this
> path, rejecting NeonTypeFlags::Float16 in getNeonType() is unnecessary?
Indeed. Also, apologies, that link is out-of-date. Here's an updated one that
points to the updated definition of `hasLegalHalfType`:
https://github.com/llvm/llvm-project/blob/8d7823ea8f40cf5df1c623018bf9c0a308fa4a36/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp?plain=1#L228
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/4] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/4] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/4] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/4] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/3] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -0,0 +1,136 @@
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
+// RUN: -target-feature +fullfp16 -disable-O0-optnone \
+// RUN: -flax-vector-conversions=none -emit-llvm -o - %s | \
+// RUN: opt -S -passes=mem2reg,sroa | FileCheck %s --check-prefix=LLVM
+// RUN: %if cir-enabled %{%clang_cc1 -triple arm64-none-linux-gnu \
+// RUN: -target-feature +neon -target-feature +fullfp16 \
+// RUN: -disable-O0-optnone -flax-vector-conversions=none \
+// RUN: -fclangir -emit-llvm -o - %s | \
+// RUN: opt -S -passes=mem2reg,sroa | FileCheck %s --check-prefix=LLVM %}
+// RUN: %if cir-enabled %{%clang_cc1 -triple arm64-none-linux-gnu \
+// RUN: -target-feature +neon -target-feature +fullfp16 \
+// RUN: -disable-O0-optnone -flax-vector-conversions=none \
+// RUN: -fclangir -emit-cir -o - %s | FileCheck %s --check-prefix=CIR %}
+
+#include
+
+// LLVM-LABEL: @test_vfma_lane_f16(
+// LLVM: shufflevector <4 x half>
+// LLVM: call <4 x half> @llvm.fma.v4f16(
+// CIR-LABEL: @test_vfma_lane_f16(
+// CIR: cir.vec.shuffle
+// CIR: cir.call_llvm_intrinsic "fma"
+float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
+ return vfma_lane_f16(a, b, c, 3);
+}
banach-space wrote:
This comment has not been addressed yet. Below is the expected format.
```suggestion
// LLVM-LABEL: @test_vfma_lane_f16(
// CIR-LABEL: @test_vfma_lane_f16(
float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
// CIR: cir.vec.shuffle
// CIR: cir.call_llvm_intrinsic "fma"
// LLVM: shufflevector <4 x half>
// LLVM: call <4 x half> @llvm.fma.v4f16(
return vfma_lane_f16(a, b, c, 3);
}
}
```
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
banach-space wrote:
Why do we need this dedicated switch ? You should be able to implement this
without it.
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -404,83 +404,6 @@ uint32x2_t test_vmul_laneq_u32(uint32x2_t a, uint32x4_t v)
{
uint32x4_t test_vmulq_laneq_u32(uint32x4_t a, uint32x4_t v) {
return vmulq_laneq_u32(a, v, 3);
}
-
-// CHECK-LABEL: @test_vfma_lane_f32(
-// CHECK-NEXT: entry:
-// CHECK-NEXT:[[TMP0:%.*]] = bitcast <2 x float> [[A:%.*]] to <2 x i32>
-// CHECK-NEXT:[[TMP1:%.*]] = bitcast <2 x float> [[B:%.*]] to <2 x i32>
-// CHECK-NEXT:[[TMP2:%.*]] = bitcast <2 x float> [[V:%.*]] to <2 x i32>
-// CHECK-NEXT:[[TMP3:%.*]] = bitcast <2 x i32> [[TMP0]] to <8 x i8>
-// CHECK-NEXT:[[TMP4:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
-// CHECK-NEXT:[[TMP5:%.*]] = bitcast <2 x i32> [[TMP2]] to <8 x i8>
-// CHECK-NEXT:[[TMP6:%.*]] = bitcast <8 x i8> [[TMP5]] to <2 x float>
-// CHECK-NEXT:[[LANE:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x
float> [[TMP6]], <2 x i32>
-// CHECK-NEXT:[[FMLA:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x float>
-// CHECK-NEXT:[[FMLA1:%.*]] = bitcast <8 x i8> [[TMP3]] to <2 x float>
banach-space wrote:
These check lines that track arguments should also be included in the new tests
(similar comment for other tests). Please also add `LLVM-SAME`, see examples in
https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon/intrinsics.c
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -2197,16 +2213,6 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned builtinID, const CallExpr *expr, return std::nullopt; case NEON::BI__builtin_neon_vbsl_v: case NEON::BI__builtin_neon_vbslq_v: - case NEON::BI__builtin_neon_vfma_lane_v: - case NEON::BI__builtin_neon_vfmaq_lane_v: - case NEON::BI__builtin_neon_vfma_laneq_v: - case NEON::BI__builtin_neon_vfmaq_laneq_v: - case NEON::BI__builtin_neon_vfmah_lane_f16: - case NEON::BI__builtin_neon_vfmas_lane_f32: - case NEON::BI__builtin_neon_vfmah_laneq_f16: - case NEON::BI__builtin_neon_vfmas_laneq_f32: - case NEON::BI__builtin_neon_vfmad_lane_f64: - case NEON::BI__builtin_neon_vfmad_laneq_f64: banach-space wrote: This comment has not been addressed yet. https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
banach-space wrote:
Why not support this case? Note that `hasLegalHalfType` has recently been
updated. Here's the latest definition:
https://github.com/llvm/llvm-project/blob/main/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp#L203
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
banach-space wrote:
Unrelated change?
https://github.com/llvm/llvm-project/pull/188190
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
banach-space wrote: > @banach-space , Do the recent changes I made seem reasonable or am I messed > this up? It looks like you are heading in the right direction, but since you rebased and force-pushed, it is hard for me to see exactly what has changed. Sometimes a re-base is unavoidable, but please keep in mind that that effectively erases Git history and is discouraged: https://llvm.org/docs/GitHub.html#rebasing-pull-requests-and-force-pushes In any case, thank you for working on this! I am about to post more comments :) https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 4ceade9630502af988e42a046e7568b3a71e96f5 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Wed, 25 Mar 2026 12:08:26 +0200
Subject: [PATCH 1/2] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern while preserving the original case order.
The scalar lane forms are dispatched before getNeonType() so the
f16 cases do not fall through the unsupported Poly128 path during
ClangIR lowering.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 66 ++-
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..c972e9e12c430 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -628,11 +627,6 @@ static bool hasExtraNeonArgument(unsigned builtinID) {
case ARM::BI__builtin_arm_vcvtr_d:
mask = 1;
}
- switch (builtinID) {
- default:
-break;
- }
-
return mask != 0;
}
@@ -2186,6 +2180,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2200,13 +2211,36 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
case NEON::BI__builtin_neon_vfma_lane_v:
case NEON::BI__builtin_neon_vfmaq_lane_v:
case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSource = ops[2];
+auto vecTy = mlir::cast(ty);
+auto elemTy = vecTy.getElementType();
+auto numElts = vecTy.getSize();
+
+if (addend.getType() != ty)
+ addend = builder.createBitcast(loc, addend, ty);
+if (multiplicand.getType() != ty)
+ multiplicand = builder.createBitcast(loc, multiplicand, ty);
+
+cir::VectorType sourceTy = ty;
+if (builtinID == NEON::BI__builtin_neon_vfmaq_lane_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts / 2);
+else if (builtinID == NEON::BI__builtin_neon_vfma_laneq_v)
+ sourceTy = cir::VectorType::get(elemTy, numElts * 2);
+
+if (laneSource.getType() != sourceTy)
+ laneSource = builder.createBitcast(loc, laneSource, sourceTy);
+
+int64_t lane =
+expr->getArg(3)->EvaluateKnownConstInt(getContext()).getSExtValue();
+llvm::SmallVector mask(numElts, lane);
+mlir::Value splat = builder.createVecShuffle(loc, laneSource, mask);
+
+llvm::SmallVector fmaOps = {multiplicand, splat, addend};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", ty, fmaOps);
+ }
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
>From ae6b618b696899275c9009
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 5cc336f971d79932ea425da7253a05428a65a610 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Sun, 22 Mar 2026 04:57:07 +0200
Subject: [PATCH 1/3] [CIR] Fix generated type constraint header dependencies
Add the missing MLIRCIRTypeConstraintsIncGen dependencies in the
CIR dialect and lowering CMake targets so clean CIR-enabled builds
generate the required headers before the lowering libraries are
compiled.
---
clang/include/clang/CIR/Dialect/IR/CMakeLists.txt | 2 +-
clang/lib/CIR/Lowering/CMakeLists.txt | 3 +++
clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
index 870f9e3f5d052..1388e5bc612f2 100644
--- a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
+++ b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
@@ -27,5 +27,5 @@ clang_tablegen(CIRLowering.inc -gen-cir-lowering
set(LLVM_TARGET_DEFINITIONS CIRTypeConstraints.td)
mlir_tablegen(CIRTypeConstraints.h.inc -gen-type-constraint-decls)
mlir_tablegen(CIRTypeConstraints.cpp.inc -gen-type-constraint-defs)
-add_public_tablegen_target(MLIRCIRTypeConstraintsIncGen)
+add_mlir_generic_tablegen_target(MLIRCIRTypeConstraintsIncGen)
add_dependencies(mlir-headers MLIRCIRTypeConstraintsIncGen)
diff --git a/clang/lib/CIR/Lowering/CMakeLists.txt
b/clang/lib/CIR/Lowering/CMakeLists.txt
index 28ec3c551018c..77d28ef72d11d 100644
--- a/clang/lib/CIR/Lowering/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/CMakeLists.txt
@@ -9,6 +9,9 @@ add_clang_library(clangCIRLoweringCommon
CIRPasses.cpp
LoweringHelpers.cpp
+ DEPENDS
+ MLIRCIRTypeConstraintsIncGen
+
LINK_LIBS
clangCIR
${dialect_libs}
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
index c7467fe40ba30..5b197ddca12c0 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangCIRLoweringDirectToLLVM
MLIRCIREnumsGen
MLIRCIROpsIncGen
MLIRCIROpInterfacesIncGen
+ MLIRCIRTypeConstraintsIncGen
LINK_LIBS
clangCIRLoweringCommon
>From a674c9106368033a28229601026f11815ae69648 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Mon, 23 Mar 2026 17:55:19 +0200
Subject: [PATCH 2/3] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern.
The patch also includes the required formatting adjustment so the
implementation matches the repository clang-format style.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 67 +++
1 file changed, 53 insertions(+), 14 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index a3488bfcc3dec..2f1e974927fd8 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -139,11 +139,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -2186,6 +2185,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
ret
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham updated
https://github.com/llvm/llvm-project/pull/188190
>From 3aa2a1dcd459df3235b33430e02d99d9f76fe00d Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Sun, 22 Mar 2026 04:57:07 +0200
Subject: [PATCH 1/3] [CIR] Fix generated type constraint header dependencies
Add the missing MLIRCIRTypeConstraintsIncGen dependencies in the
CIR dialect and lowering CMake targets so clean CIR-enabled builds
generate the required headers before the lowering libraries are
compiled.
---
clang/include/clang/CIR/Dialect/IR/CMakeLists.txt | 2 +-
clang/lib/CIR/Lowering/CMakeLists.txt | 3 +++
clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
index 870f9e3f5d052..1388e5bc612f2 100644
--- a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
+++ b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
@@ -27,5 +27,5 @@ clang_tablegen(CIRLowering.inc -gen-cir-lowering
set(LLVM_TARGET_DEFINITIONS CIRTypeConstraints.td)
mlir_tablegen(CIRTypeConstraints.h.inc -gen-type-constraint-decls)
mlir_tablegen(CIRTypeConstraints.cpp.inc -gen-type-constraint-defs)
-add_public_tablegen_target(MLIRCIRTypeConstraintsIncGen)
+add_mlir_generic_tablegen_target(MLIRCIRTypeConstraintsIncGen)
add_dependencies(mlir-headers MLIRCIRTypeConstraintsIncGen)
diff --git a/clang/lib/CIR/Lowering/CMakeLists.txt
b/clang/lib/CIR/Lowering/CMakeLists.txt
index 28ec3c551018c..77d28ef72d11d 100644
--- a/clang/lib/CIR/Lowering/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/CMakeLists.txt
@@ -9,6 +9,9 @@ add_clang_library(clangCIRLoweringCommon
CIRPasses.cpp
LoweringHelpers.cpp
+ DEPENDS
+ MLIRCIRTypeConstraintsIncGen
+
LINK_LIBS
clangCIR
${dialect_libs}
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
index c7467fe40ba30..5b197ddca12c0 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangCIRLoweringDirectToLLVM
MLIRCIREnumsGen
MLIRCIROpsIncGen
MLIRCIROpInterfacesIncGen
+ MLIRCIRTypeConstraintsIncGen
LINK_LIBS
clangCIRLoweringCommon
>From edf952a72486ae41b7720f92067638ee76b09251 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Mon, 23 Mar 2026 17:55:19 +0200
Subject: [PATCH 2/3] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern.
The patch also includes the required formatting adjustment so the
implementation matches the repository clang-format style.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 67 +++
1 file changed, 53 insertions(+), 14 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index 5d7b8d839fa84..26560b2ab3447 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -801,11 +801,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -2848,6 +2847,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
ret
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
llvmbot wrote:
@llvm/pr-subscribers-clangir
Author: Yair Ben Avraham (yairbenavraham)
Changes
This PR implements the AArch64 NEON ClangIR lowering for the vfma lane/laneq
builtins and adds CIR-enabled regression tests.
Covered scope:
- vector lane/laneq forms
- scalar lane/laneq forms
- includes the vfmaq_laneq_v family called out in #185382
Validation:
- clean build from scratch
- post-build sanity check
- focused llvm-lit validation for the touched AArch64 NEON tests
Part of #185382
---
Full diff: https://github.com/llvm/llvm-project/pull/188190.diff
6 Files Affected:
- (modified) clang/include/clang/CIR/Dialect/IR/CMakeLists.txt (+1-1)
- (modified) clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp (+53-14)
- (modified) clang/lib/CIR/Lowering/CMakeLists.txt (+3)
- (modified) clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt (+1)
- (added) clang/test/CodeGen/AArch64/neon/vfma-lane.c (+136)
- (added) clang/test/CodeGen/AArch64/neon/vfma-scalar-lane.c (+77)
``diff
diff --git a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
index 870f9e3f5d052..1388e5bc612f2 100644
--- a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
+++ b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
@@ -27,5 +27,5 @@ clang_tablegen(CIRLowering.inc -gen-cir-lowering
set(LLVM_TARGET_DEFINITIONS CIRTypeConstraints.td)
mlir_tablegen(CIRTypeConstraints.h.inc -gen-type-constraint-decls)
mlir_tablegen(CIRTypeConstraints.cpp.inc -gen-type-constraint-defs)
-add_public_tablegen_target(MLIRCIRTypeConstraintsIncGen)
+add_mlir_generic_tablegen_target(MLIRCIRTypeConstraintsIncGen)
add_dependencies(mlir-headers MLIRCIRTypeConstraintsIncGen)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index 5d7b8d839fa84..26560b2ab3447 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -801,11 +801,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -2848,6 +2847,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2859,16 +2875,6 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return std::nullopt;
case NEON::BI__builtin_neon_vbsl_v:
case NEON::BI__builtin_neon_vbslq_v:
- case NEON::BI__builtin_neon_vfma_lane_v:
- case NEON::BI__builtin_neon_vfmaq_lane_v:
- case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
@@ -2886,6 +2892,39 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
if (cir::isFPOrVectorOfFPType(ty))
intrName = "aarch64.neon.fabd";
return emitNeonCall(cgm, builder, {ty, ty}, ops, intrName, ty, loc);
+ case NEON::BI__builtin_neon_vfma_lane_v:
+ case NEON::BI__builtin_neon_vfmaq_lane_v:
+ case NEON::BI__builtin_neon_vfma_laneq_v:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneS
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: Yair Ben Avraham (yairbenavraham)
Changes
This PR implements the AArch64 NEON ClangIR lowering for the vfma lane/laneq
builtins and adds CIR-enabled regression tests.
Covered scope:
- vector lane/laneq forms
- scalar lane/laneq forms
- includes the vfmaq_laneq_v family called out in #185382
Validation:
- clean build from scratch
- post-build sanity check
- focused llvm-lit validation for the touched AArch64 NEON tests
Part of #185382
---
Full diff: https://github.com/llvm/llvm-project/pull/188190.diff
6 Files Affected:
- (modified) clang/include/clang/CIR/Dialect/IR/CMakeLists.txt (+1-1)
- (modified) clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp (+53-14)
- (modified) clang/lib/CIR/Lowering/CMakeLists.txt (+3)
- (modified) clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt (+1)
- (added) clang/test/CodeGen/AArch64/neon/vfma-lane.c (+136)
- (added) clang/test/CodeGen/AArch64/neon/vfma-scalar-lane.c (+77)
``diff
diff --git a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
index 870f9e3f5d052..1388e5bc612f2 100644
--- a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
+++ b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
@@ -27,5 +27,5 @@ clang_tablegen(CIRLowering.inc -gen-cir-lowering
set(LLVM_TARGET_DEFINITIONS CIRTypeConstraints.td)
mlir_tablegen(CIRTypeConstraints.h.inc -gen-type-constraint-decls)
mlir_tablegen(CIRTypeConstraints.cpp.inc -gen-type-constraint-defs)
-add_public_tablegen_target(MLIRCIRTypeConstraintsIncGen)
+add_mlir_generic_tablegen_target(MLIRCIRTypeConstraintsIncGen)
add_dependencies(mlir-headers MLIRCIRTypeConstraintsIncGen)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index 5d7b8d839fa84..26560b2ab3447 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -801,11 +801,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -2848,6 +2847,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value lane = cir::VecExtractOp::create(builder, loc, ops[2], ops[3]);
+mlir::Type scalarTy = convertType(expr->getType());
+llvm::SmallVector fmaOps = {ops[1], lane, ops[0]};
+return emitCallMaybeConstrainedBuiltin(builder, loc, "fma", scalarTy,
+ fmaOps);
+ }
+ default:
+break;
+ }
+
cir::VectorType ty = getNeonType(this, type, loc);
if (!ty)
return nullptr;
@@ -2859,16 +2875,6 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return std::nullopt;
case NEON::BI__builtin_neon_vbsl_v:
case NEON::BI__builtin_neon_vbslq_v:
- case NEON::BI__builtin_neon_vfma_lane_v:
- case NEON::BI__builtin_neon_vfmaq_lane_v:
- case NEON::BI__builtin_neon_vfma_laneq_v:
- case NEON::BI__builtin_neon_vfmaq_laneq_v:
- case NEON::BI__builtin_neon_vfmah_lane_f16:
- case NEON::BI__builtin_neon_vfmas_lane_f32:
- case NEON::BI__builtin_neon_vfmah_laneq_f16:
- case NEON::BI__builtin_neon_vfmas_laneq_f32:
- case NEON::BI__builtin_neon_vfmad_lane_f64:
- case NEON::BI__builtin_neon_vfmad_laneq_f64:
case NEON::BI__builtin_neon_vmull_v:
case NEON::BI__builtin_neon_vmax_v:
case NEON::BI__builtin_neon_vmaxq_v:
@@ -2886,6 +2892,39 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
if (cir::isFPOrVectorOfFPType(ty))
intrName = "aarch64.neon.fabd";
return emitNeonCall(cgm, builder, {ty, ty}, ops, intrName, ty, loc);
+ case NEON::BI__builtin_neon_vfma_lane_v:
+ case NEON::BI__builtin_neon_vfmaq_lane_v:
+ case NEON::BI__builtin_neon_vfma_laneq_v:
+ case NEON::BI__builtin_neon_vfmaq_laneq_v: {
+mlir::Value addend = ops[0];
+mlir::Value multiplicand = ops[1];
+mlir::Value laneSou
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/188190 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CIR][AArch64] Lower vfma lane builtins (PR #188190)
https://github.com/yairbenavraham created
https://github.com/llvm/llvm-project/pull/188190
This PR implements the AArch64 NEON ClangIR lowering for the vfma lane/laneq
builtins and adds CIR-enabled regression tests.
Covered scope:
- vector lane/laneq forms
- scalar lane/laneq forms
- includes the vfmaq_laneq_v family called out in #185382
Validation:
- clean build from scratch
- post-build sanity check
- focused llvm-lit validation for the touched AArch64 NEON tests
Part of #185382
>From 3aa2a1dcd459df3235b33430e02d99d9f76fe00d Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Sun, 22 Mar 2026 04:57:07 +0200
Subject: [PATCH 1/3] [CIR] Fix generated type constraint header dependencies
Add the missing MLIRCIRTypeConstraintsIncGen dependencies in the
CIR dialect and lowering CMake targets so clean CIR-enabled builds
generate the required headers before the lowering libraries are
compiled.
---
clang/include/clang/CIR/Dialect/IR/CMakeLists.txt | 2 +-
clang/lib/CIR/Lowering/CMakeLists.txt | 3 +++
clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
index 870f9e3f5d052..1388e5bc612f2 100644
--- a/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
+++ b/clang/include/clang/CIR/Dialect/IR/CMakeLists.txt
@@ -27,5 +27,5 @@ clang_tablegen(CIRLowering.inc -gen-cir-lowering
set(LLVM_TARGET_DEFINITIONS CIRTypeConstraints.td)
mlir_tablegen(CIRTypeConstraints.h.inc -gen-type-constraint-decls)
mlir_tablegen(CIRTypeConstraints.cpp.inc -gen-type-constraint-defs)
-add_public_tablegen_target(MLIRCIRTypeConstraintsIncGen)
+add_mlir_generic_tablegen_target(MLIRCIRTypeConstraintsIncGen)
add_dependencies(mlir-headers MLIRCIRTypeConstraintsIncGen)
diff --git a/clang/lib/CIR/Lowering/CMakeLists.txt
b/clang/lib/CIR/Lowering/CMakeLists.txt
index 28ec3c551018c..77d28ef72d11d 100644
--- a/clang/lib/CIR/Lowering/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/CMakeLists.txt
@@ -9,6 +9,9 @@ add_clang_library(clangCIRLoweringCommon
CIRPasses.cpp
LoweringHelpers.cpp
+ DEPENDS
+ MLIRCIRTypeConstraintsIncGen
+
LINK_LIBS
clangCIR
${dialect_libs}
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
index c7467fe40ba30..5b197ddca12c0 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangCIRLoweringDirectToLLVM
MLIRCIREnumsGen
MLIRCIROpsIncGen
MLIRCIROpInterfacesIncGen
+ MLIRCIRTypeConstraintsIncGen
LINK_LIBS
clangCIRLoweringCommon
>From edf952a72486ae41b7720f92067638ee76b09251 Mon Sep 17 00:00:00 2001
From: Yair Ben Avraham
Date: Mon, 23 Mar 2026 17:55:19 +0200
Subject: [PATCH 2/3] [CIR][AArch64] Lower vfma lane builtins
Lower the AArch64 vfma lane and laneq builtins in CIR codegen.
This adds handling for the vector and scalar vfma lane forms,
including the vfmaq_laneq_v family called out in the issue, and
keeps the CIR builtin structure aligned with the existing AArch64
builtin lowering pattern.
The patch also includes the required formatting adjustment so the
implementation matches the repository clang-format style.
---
.../lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp | 67 +++
1 file changed, 53 insertions(+), 14 deletions(-)
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
index 5d7b8d839fa84..26560b2ab3447 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
@@ -801,11 +801,10 @@ static cir::VectorType getNeonType(CIRGenFunction *cgf,
NeonTypeFlags typeFlags,
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: BFloat16"));
[[fallthrough]];
case NeonTypeFlags::Float16:
-if (hasLegalHalfType)
+if (!hasLegalHalfType)
cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-else
- cgf->getCIRGenModule().errorNYI(loc, std::string("NEON type: Float16"));
-[[fallthrough]];
+return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty,
+v1Ty ? 1 : (4 << isQuad));
case NeonTypeFlags::Int32:
return cir::VectorType::get(typeFlags.isUnsigned() ? cgf->uInt32Ty
: cgf->sInt32Ty,
@@ -2848,6 +2847,23 @@ CIRGenFunction::emitAArch64BuiltinExpr(unsigned
builtinID, const CallExpr *expr,
return mlir::Value{};
}
+ switch (builtinID) {
+ case NEON::BI__builtin_neon_vfmah_lane_f16:
+ case NEON::BI__builtin_neon_vfmas_lane_f32:
+ case NEON::BI__builtin_neon_vfmah_laneq_f16:
+ case NEON::BI__builtin_neon_vfmas_laneq_f32:
+ case NEON::BI__builtin_neon_vfmad_lane_f64:
+ case NEON::BI__builtin_neon_vfmad_laneq_f64: {
+mlir::Value l
