[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-04-05 Thread LLVM Continuous Integration via cfe-commits

llvm-ci wrote:

LLVM Buildbot has detected a new failure on builder `sanitizer-aarch64-linux` 
running on `sanitizer-buildbot7` while building `clang` at step 2 "annotate".

Full details are available at: 
https://lab.llvm.org/buildbot/#/builders/51/builds/12901


Here is the relevant piece of the build log for the reference

```
Step 2 (annotate) failure: 'python 
../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'
 (failure)
...
[3990/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaOpenACC.cpp.o
[3991/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclObjC.cpp.o
[3992/5460] Building CXX object 
tools/clang/lib/ASTMatchers/CMakeFiles/obj.clangASTMatchers.dir/ASTMatchFinder.cpp.o
[3993/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaObjCProperty.cpp.o
[3994/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaInit.cpp.o
[3995/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaObjC.cpp.o
[3996/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaPseudoObject.cpp.o
[3997/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaWasm.cpp.o
[3998/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaX86.cpp.o
[3999/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o
FAILED: tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache 
/home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DCLANG_EXPORTS 
-DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
-I/home/b/sanitizer-aarch64-linux/build/build_default/tools/clang/lib/Sema 
-I/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/lib/Sema 
-I/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/include 
-I/home/b/sanitizer-aarch64-linux/build/build_default/tools/clang/include 
-I/home/b/sanitizer-aarch64-linux/build/build_default/include 
-I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC 
-fno-semantic-interposition -fvisibility-inlines-hidden -Werror 
-Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers 
-pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color 
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual 
-Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables 
-fno-rtti -UNDEBUG -MD -MT 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -MF 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o.d -o 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -c 
/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/lib/Sema/SemaExprCXX.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/lib/Sema/SemaExprCXX.cpp:4139:27:
 error: enumeration value 'Binary' not handled in switch [-Werror,-Wswitch]
 4139 |   switch (StrLit->getKind()) {
  |   ^
1 error generated.
[4000/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaStmtAsm.cpp.o
[4001/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaLambda.cpp.o
[4002/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaStmtAttr.cpp.o
[4003/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplateVariadic.cpp.o
[4004/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaRISCV.cpp.o
[4005/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.o
[4006/5460] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGLoopInfo.cpp.o
[4007/5460] Building CXX object 
tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ExprConstant.cpp.o
[4008/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclAttr.cpp.o
[4009/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o
[4010/5460] Building CXX object 
tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ByteCode/EvalEmitter.cpp.o
[4011/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaLookup.cpp.o
[4012/5460] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCodeComplete.cpp.o
[4013/5460] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfo.cpp.o
[4014/5460] Building CXX object 
tools/clang/lib/CodeG

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-04-05 Thread Mariya Podchishchaeva via cfe-commits

Fznamznon wrote:

Ooops, I'll fix the bots quickly


https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon closed 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread LLVM Continuous Integration via cfe-commits

llvm-ci wrote:

LLVM Buildbot has detected a new failure on builder 
`ppc64le-lld-multistage-test` running on `ppc64le-lld-multistage-test` while 
building `clang` at step 12 "build-stage2-unified-tree".

Full details are available at: 
https://lab.llvm.org/buildbot/#/builders/168/builds/9949


Here is the relevant piece of the build log for the reference

```
Step 12 (build-stage2-unified-tree) failure: build (failure)
...
326.595 [830/214/5449] Building CXX object 
lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64RegisterInfo.cpp.o
326.686 [830/213/5450] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCMac.cpp.o
326.731 [830/212/5451] Building CXX object 
lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/GISel/X86InstructionSelector.cpp.o
326.739 [830/211/5452] Building CXX object 
tools/llvm-exegesis/lib/RISCV/CMakeFiles/LLVMExegesisRISCV.dir/Target.cpp.o
326.775 [830/210/5453] Building CXX object 
tools/clang/unittests/Frontend/CMakeFiles/FrontendTests.dir/OutputStreamTest.cpp.o
326.821 [830/209/5454] Building CXX object 
tools/clang/unittests/Frontend/CMakeFiles/FrontendTests.dir/FrontendActionTest.cpp.o
326.822 [830/208/5455] Building CXX object 
tools/clang/unittests/Frontend/CMakeFiles/FrontendTests.dir/CodeGenActionTest.cpp.o
326.834 [830/207/5456] Building CXX object 
tools/clang/tools/clang-installapi/CMakeFiles/clang-installapi.dir/ClangInstallAPI.cpp.o
326.887 [830/206/5457] Building CXX object 
lib/Target/RISCV/CMakeFiles/LLVMRISCVCodeGen.dir/RISCVGatherScatterLowering.cpp.o
326.888 [830/205/5458] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o
FAILED: tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o 
ccache 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang++
 -DCLANG_EXPORTS -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS 
-D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/tools/clang/lib/Sema
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/clang/lib/Sema
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/clang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/tools/clang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include
 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror 
-Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers 
-pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color 
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual 
-Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables 
-fno-rtti -UNDEBUG -MD -MT 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -MF 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o.d -o 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -c 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/clang/lib/Sema/SemaExprCXX.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/clang/lib/Sema/SemaExprCXX.cpp:4139:27:
 error: enumeration value 'Binary' not handled in switch [-Werror,-Wswitch]
 4139 |   switch (StrLit->getKind()) {
  |   ^
1 error generated.
326.895 [830/204/5459] Building CXX object 
tools/clang/lib/Tooling/Refactoring/CMakeFiles/obj.clangToolingRefactoring.dir/ASTSelection.cpp.o
326.915 [830/203/5460] Building CXX object 
tools/clang/unittests/Tooling/Syntax/CMakeFiles/SyntaxTests.dir/TreeTest.cpp.o
326.960 [830/202/5461] Building CXX object 
tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ParentMapContext.cpp.o
327.008 [830/201/5462] Building CXX object 
lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64PostLegalizerCombiner.cpp.o
327.013 [830/200/5463] Building CXX object 
lib/Target/RISCV/CMakeFiles/LLVMRISCVCodeGen.dir/RISCVAsmPrinter.cpp.o
327.046 [830/199/5464] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CoverageMappingGen.cpp.o
327.

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread LLVM Continuous Integration via cfe-commits

llvm-ci wrote:

LLVM Buildbot has detected a new failure on builder `sanitizer-ppc64le-linux` 
running on `ppc64le-sanitizer` while building `clang` at step 2 "annotate".

Full details are available at: 
https://lab.llvm.org/buildbot/#/builders/72/builds/9355


Here is the relevant piece of the build log for the reference

```
Step 2 (annotate) failure: 'python 
../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'
 (failure)
...
[4025/4152] Building CXX object 
tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ASTImporterLookupTable.cpp.o
[4026/4152] Building CXX object 
tools/clang/lib/Interpreter/CMakeFiles/obj.clangInterpreter.dir/Interpreter.cpp.o
[4027/4152] Building CXX object 
tools/clang/tools/clang-scan-deps/CMakeFiles/clang-scan-deps.dir/ClangScanDeps.cpp.o
[4028/4152] Building CXX object 
tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/RetainPtrCtorAdoptChecker.cpp.o
[4029/4152] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CoverageMappingGen.cpp.o
[4030/4152] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCGNU.cpp.o
[4031/4152] Building CXX object 
tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/VTableBuilder.cpp.o
[4032/4152] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o
[4033/4152] Building CXX object 
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ItaniumCXXABI.cpp.o
[4034/4152] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o
FAILED: tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm_build0/bin/clang++
 -DCLANG_EXPORTS -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS 
-D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/tools/clang/lib/Sema
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/clang/lib/Sema
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/clang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/tools/clang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/llvm/include
 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror 
-Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers 
-pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color 
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual 
-Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables 
-fno-rtti -UNDEBUG -MD -MT 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -MF 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o.d -o 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o -c 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/clang/lib/Sema/SemaExprCXX.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/clang/lib/Sema/SemaExprCXX.cpp:4139:27:
 error: enumeration value 'Binary' not handled in switch [-Werror,-Wswitch]
 4139 |   switch (StrLit->getKind()) {
  |   ^
1 error generated.
[4035/4152] Building CXX object 
tools/clang/lib/Tooling/Refactoring/CMakeFiles/obj.clangToolingRefactoring.dir/Rename/USRFindingAction.cpp.o
[4036/4152] Building CXX object 
tools/clang/lib/Frontend/CMakeFiles/obj.clangFrontend.dir/InterfaceStubFunctionsConsumer.cpp.o
[4037/4152] Building CXX object 
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/ASTReaderDecl.cpp.o
[4038/4152] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaARM.cpp.o
[4039/4152] Building CXX object 
tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[4040/4152] Building CXX object 
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/ParsedAttr.cpp.o
[4041/4152] Building CXX object 
tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/ForwardDeclChecker.cpp.o
[4042/4152] Building CXX object 
tools/clang/lib/CodeGen/CMakeFil

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread via cfe-commits

https://github.com/cor3ntin approved this pull request.


https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon updated 
https://github.com/llvm/llvm-project/pull/127629

>From 700ec6f78c0a24729801bea381bafbcafb06826b Mon Sep 17 00:00:00 2001
From: "Podchishchaeva, Mariya" 
Date: Tue, 18 Feb 2025 05:12:07 -0800
Subject: [PATCH 1/3] [clang] Introduce "binary" StringLiteral for #embed data

StringLiteral is used as internal data of EmbedExpr and we directly use it as
an initializer if a single EmbedExpr appears in the initializer list of a char
array. It is fast and convenient, but it is causing problems when
string literal character values are checked because #embed data values
are within a range [0-2^(char width)] but ordinary StringLiteral is of
maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming from
an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find StringLiteral
token locations since these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256
---
 clang/include/clang/AST/Expr.h|  7 ---
 clang/lib/AST/Expr.cpp|  8 
 clang/lib/Parse/ParseInit.cpp |  2 +-
 clang/lib/Sema/SemaInit.cpp   |  1 +
 clang/test/Preprocessor/embed_constexpr.c | 21 +
 5 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 clang/test/Preprocessor/embed_constexpr.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index cd584d9621a22..cf6f63b8711b8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6f570139630d8..2747480f00d68 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,11 +1334,17 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
+
   // Loop over all of the tokens in this string until we find the one that
   // contains the byte we're looking for.
   unsigned TokNo = 0;
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53..471b3eaf28287 100644
--- a/clang/lib/Parse/ParseInit.cpp
+++ b/clang/lib/Parse/ParseInit.cpp
@@ -445,7 +445,7 @@ ExprResult Parser::createEmbedExpr() {
   Context.MakeIntValue(Str.size(), Context.getSizeType());
   QualType ArrayTy = Context.getConstantArrayType(
   Ty, ArraySize, nullptr, ArraySizeModifier::Normal, 0);
-  return StringLiteral::Create(Context, Str, StringLiteralKind::Ordinary,
+  return StringLiteral::Create(Context, Str, StringLiteralKind::Binary,
false, ArrayTy, StartLoc);
 };
 
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-20 Thread via cfe-commits


@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary

cor3ntin wrote:

Yup, Thanks!

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Mariya Podchishchaeva via cfe-commits

Fznamznon wrote:

> No changelog because we want to backport?

No changelog because I forgot. Should we backport though?

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon updated 
https://github.com/llvm/llvm-project/pull/127629

>From 700ec6f78c0a24729801bea381bafbcafb06826b Mon Sep 17 00:00:00 2001
From: "Podchishchaeva, Mariya" 
Date: Tue, 18 Feb 2025 05:12:07 -0800
Subject: [PATCH 1/3] [clang] Introduce "binary" StringLiteral for #embed data

StringLiteral is used as internal data of EmbedExpr and we directly use it as
an initializer if a single EmbedExpr appears in the initializer list of a char
array. It is fast and convenient, but it is causing problems when
string literal character values are checked because #embed data values
are within a range [0-2^(char width)] but ordinary StringLiteral is of
maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming from
an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find StringLiteral
token locations since these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256
---
 clang/include/clang/AST/Expr.h|  7 ---
 clang/lib/AST/Expr.cpp|  8 
 clang/lib/Parse/ParseInit.cpp |  2 +-
 clang/lib/Sema/SemaInit.cpp   |  1 +
 clang/test/Preprocessor/embed_constexpr.c | 21 +
 5 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 clang/test/Preprocessor/embed_constexpr.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index cd584d9621a22..cf6f63b8711b8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6f570139630d8..2747480f00d68 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,11 +1334,17 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
+
   // Loop over all of the tokens in this string until we find the one that
   // contains the byte we're looking for.
   unsigned TokNo = 0;
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53..471b3eaf28287 100644
--- a/clang/lib/Parse/ParseInit.cpp
+++ b/clang/lib/Parse/ParseInit.cpp
@@ -445,7 +445,7 @@ ExprResult Parser::createEmbedExpr() {
   Context.MakeIntValue(Str.size(), Context.getSizeType());
   QualType ArrayTy = Context.getConstantArrayType(
   Ty, ArraySize, nullptr, ArraySizeModifier::Normal, 0);
-  return StringLiteral::Create(Context, Str, StringLiteralKind::Ordinary,
+  return StringLiteral::Create(Context, Str, StringLiteralKind::Binary,
false, ArrayTy, StartLoc);
 };
 
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon edited 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Mariya Podchishchaeva via cfe-commits


@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary

Fznamznon wrote:

I added some comment. Do you think it turned out helpful?

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Aaron Ballman via cfe-commits

https://github.com/AaronBallman approved this pull request.

LGTM!

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-18 Thread Aaron Ballman via cfe-commits

AaronBallman wrote:

> > No changelog because we want to backport?
> 
> No changelog because I forgot. Should we backport though?

This seems simple enough to warrant backporting.

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-15 Thread via cfe-commits

https://github.com/cor3ntin edited 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-15 Thread via cfe-commits

https://github.com/cor3ntin edited 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-15 Thread via cfe-commits

https://github.com/cor3ntin approved this pull request.

No changelog because we want to backport?

LGTM but i think this could use some comments

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-15 Thread via cfe-commits


@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary

cor3ntin wrote:

Can you add a comment explaining this is for embed?

I'm sorry it took me a while to understand how this patch works.
(The reason is that this allows us to not "cast" to char in `getCodeUnitS()` - 
which is only used in C23 mode)

Maybe also add a comment in `getCodeUnitS` and/or 
`CheckC23ConstexprInitStringLiteral`

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-12 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon edited 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-12 Thread Mariya Podchishchaeva via cfe-commits


@@ -1332,6 +1334,11 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);

Fznamznon wrote:

It will point at the beginning of the StringLiteral, but not to a particular 
problematic byte (which this function intends to find). In case of `#embed` it 
will point to the directive location. The test I'm adding should exercise that.

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-12 Thread Aaron Ballman via cfe-commits

https://github.com/AaronBallman commented:

Thanks for this! It also needs a release note for the fix. :-)

In general, I think this seems reasonable, but I'd like confirmation from 
@cor3ntin given his somewhat recent thinking about unevaluated string literals 
and where those end up touching the rest of the compiler.

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-12 Thread Aaron Ballman via cfe-commits


@@ -1332,6 +1334,11 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);

AaronBallman wrote:

Does this mean that diagnostics involving invalid use of binary string literals 
will have no source location to point to?

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-12 Thread Aaron Ballman via cfe-commits

https://github.com/AaronBallman edited 
https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-03-07 Thread Mariya Podchishchaeva via cfe-commits

Fznamznon wrote:

ping

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread Mariya Podchishchaeva via cfe-commits


@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),

Fznamznon wrote:

I wish it did. This particular line helps for generic case of `#embed`  inside 
of an initializer list. The rest that this patch is adding is for #embed "fast 
path" where we simply put StringLiteral to the initializer list instead of 
EmbedExpr when a single #embed is used to initialize char array.
When we do that, the iterators of EmbedExpr won't be in use and the fail from 
https://github.com/llvm/llvm-project/issues/119256 is still in place.

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread via cfe-commits


@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),

cor3ntin wrote:

If you change that line (and nothing else) does it still works?
I have a feeling this is enough

https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 93d3e20bb226507c6eb777cfb15ea13f2cd129e8 
700ec6f78c0a24729801bea381bafbcafb06826b --extensions cpp,c,h -- 
clang/test/Preprocessor/embed_constexpr.c clang/include/clang/AST/Expr.h 
clang/lib/AST/Expr.cpp clang/lib/Parse/ParseInit.cpp clang/lib/Sema/SemaInit.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 2747480f00..e48b389fbc 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1344,7 +1344,6 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
-
   // Loop over all of the tokens in this string until we find the one that
   // contains the byte we're looking for.
   unsigned TokNo = 0;

``




https://github.com/llvm/llvm-project/pull/127629
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon updated 
https://github.com/llvm/llvm-project/pull/127629

>From 700ec6f78c0a24729801bea381bafbcafb06826b Mon Sep 17 00:00:00 2001
From: "Podchishchaeva, Mariya" 
Date: Tue, 18 Feb 2025 05:12:07 -0800
Subject: [PATCH 1/2] [clang] Introduce "binary" StringLiteral for #embed data

StringLiteral is used as internal data of EmbedExpr and we directly use it as
an initializer if a single EmbedExpr appears in the initializer list of a char
array. It is fast and convenient, but it is causing problems when
string literal character values are checked because #embed data values
are within a range [0-2^(char width)] but ordinary StringLiteral is of
maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming from
an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find StringLiteral
token locations since these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256
---
 clang/include/clang/AST/Expr.h|  7 ---
 clang/lib/AST/Expr.cpp|  8 
 clang/lib/Parse/ParseInit.cpp |  2 +-
 clang/lib/Sema/SemaInit.cpp   |  1 +
 clang/test/Preprocessor/embed_constexpr.c | 21 +
 5 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 clang/test/Preprocessor/embed_constexpr.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index cd584d9621a22..cf6f63b8711b8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6f570139630d8..2747480f00d68 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,11 +1334,17 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
+
   // Loop over all of the tokens in this string until we find the one that
   // contains the byte we're looking for.
   unsigned TokNo = 0;
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53..471b3eaf28287 100644
--- a/clang/lib/Parse/ParseInit.cpp
+++ b/clang/lib/Parse/ParseInit.cpp
@@ -445,7 +445,7 @@ ExprResult Parser::createEmbedExpr() {
   Context.MakeIntValue(Str.size(), Context.getSizeType());
   QualType ArrayTy = Context.getConstantArrayType(
   Ty, ArraySize, nullptr, ArraySizeModifier::Normal, 0);
-  return StringLiteral::Create(Context, Str, StringLiteralKind::Ordinary,
+  return StringLiteral::Create(Context, Str, StringLiteralKind::Binary,
false, ArrayTy, StartLoc);
 };
 
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Mariya Podchishchaeva (Fznamznon)


Changes

StringLiteral is used as internal data of EmbedExpr and we directly use it as 
an initializer if a single EmbedExpr appears in the initializer list of a char 
array. It is fast and convenient, but it is causing problems when string 
literal character values are checked because #embed data values are within a 
range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char 
type.
This PR introduces new kind of StringLiteral to hold binary data coming from an 
embedded resource to mitigate these problems. The new kind of StringLiteral is 
not assumed to have signed char type. The new kind of StringLiteral also helps 
to prevent crashes when trying to find StringLiteral token locations since 
these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256

---
Full diff: https://github.com/llvm/llvm-project/pull/127629.diff


5 Files Affected:

- (modified) clang/include/clang/AST/Expr.h (+4-3) 
- (modified) clang/lib/AST/Expr.cpp (+8) 
- (modified) clang/lib/Parse/ParseInit.cpp (+1-1) 
- (modified) clang/lib/Sema/SemaInit.cpp (+1) 
- (added) clang/test/Preprocessor/embed_constexpr.c (+21) 


``diff
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index cd584d9621a22..cf6f63b8711b8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6f570139630d8..2747480f00d68 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,11 +1334,17 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
+
   // Loop over all of the tokens in this string until we find the one that
   // contains the byte we're looking for.
   unsigned TokNo = 0;
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53..471b3eaf28287 100644
--- a/clang/lib/Parse/ParseInit.cpp
+++ b/clang/lib/Parse/ParseInit.cpp
@@ -445,7 +445,7 @@ ExprResult Parser::createEmbedExpr() {
   Context.MakeIntValue(Str.size(), Context.getSizeType());
   QualType ArrayTy = Context.getConstantArrayType(
   Ty, ArraySize, nullptr, ArraySizeModifier::Normal, 0);
-  return StringLiteral::Create(Context, Str, StringLiteralKind::Ordinary,
+  return StringLiteral::Create(Context, Str, StringLiteralKind::Binary,
false, ArrayTy, StartLoc);
 };
 
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 6a76e6d74a4b0..013e57df6615c 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -106,6 +106,7 @@ static StringInitFailureKind IsStringInit(Expr *Init, const 
ArrayType *AT,
   return SIF_None;
 [[fallth

[clang] [clang] Introduce "binary" StringLiteral for #embed data (PR #127629)

2025-02-18 Thread Mariya Podchishchaeva via cfe-commits

https://github.com/Fznamznon created 
https://github.com/llvm/llvm-project/pull/127629

StringLiteral is used as internal data of EmbedExpr and we directly use it as 
an initializer if a single EmbedExpr appears in the initializer list of a char 
array. It is fast and convenient, but it is causing problems when string 
literal character values are checked because #embed data values are within a 
range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char 
type.
This PR introduces new kind of StringLiteral to hold binary data coming from an 
embedded resource to mitigate these problems. The new kind of StringLiteral is 
not assumed to have signed char type. The new kind of StringLiteral also helps 
to prevent crashes when trying to find StringLiteral token locations since 
these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256

>From 700ec6f78c0a24729801bea381bafbcafb06826b Mon Sep 17 00:00:00 2001
From: "Podchishchaeva, Mariya" 
Date: Tue, 18 Feb 2025 05:12:07 -0800
Subject: [PATCH] [clang] Introduce "binary" StringLiteral for #embed data

StringLiteral is used as internal data of EmbedExpr and we directly use it as
an initializer if a single EmbedExpr appears in the initializer list of a char
array. It is fast and convenient, but it is causing problems when
string literal character values are checked because #embed data values
are within a range [0-2^(char width)] but ordinary StringLiteral is of
maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming from
an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find StringLiteral
token locations since these simply do not exist for binary data.

Fixes https://github.com/llvm/llvm-project/issues/119256
---
 clang/include/clang/AST/Expr.h|  7 ---
 clang/lib/AST/Expr.cpp|  8 
 clang/lib/Parse/ParseInit.cpp |  2 +-
 clang/lib/Sema/SemaInit.cpp   |  1 +
 clang/test/Preprocessor/embed_constexpr.c | 21 +
 5 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 clang/test/Preprocessor/embed_constexpr.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index cd584d9621a22..cf6f63b8711b8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -4965,9 +4966,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6f570139630d8..2747480f00d68 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,11 +1334,17 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
  "Only narrow string literals are currently supported");
 
+
   // Loop over all of the tokens in this string until we find the one