[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
https://github.com/mtrofin approved this pull request. doc nit, otherwise lgtm https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -448,7 +448,10 @@ downstream tasks, including ML-guided compiler optimizations. The core components are: - **Vocabulary**: A mapping from IR entities (opcodes, types, etc.) to their -vector representations. This is managed by ``IR2VecVocabAnalysis``. +vector representations. This is managed by ``IR2VecVocabAnalysis``. The +vocabulary (.json file) contains three sections -- Opcodes, Types, and +Arguments, each containing the representations of the corresponding +entities. mtrofin wrote: document that the sections are mandatory, but the order in which they appear isn't https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -259,32 +306,40 @@ Error IR2VecVocabAnalysis::readVocabulary() {
return createFileError(VocabFile, BufOrError.getError());
auto Content = BufOrError.get()->getBuffer();
- json::Path::Root Path("");
+
Expected ParsedVocabValue = json::parse(Content);
if (!ParsedVocabValue)
return ParsedVocabValue.takeError();
- bool Res = json::fromJSON(*ParsedVocabValue, Vocabulary, Path);
- if (!Res)
-return createStringError(errc::illegal_byte_sequence,
- "Unable to parse the vocabulary");
+ ir2vec::Vocab OpcodeVocab, TypeVocab, ArgVocab;
+ unsigned OpcodeDim, TypeDim, ArgDim;
+ if (auto Err = parseVocabSection("Opcodes", *ParsedVocabValue, OpcodeVocab,
svkeerthy wrote:
Correct. Will put it in the doc.
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -104,7 +106,10 @@ MODULE_PASS("lower-ifunc", LowerIFuncPass())
MODULE_PASS("simplify-type-tests", SimplifyTypeTestsPass())
MODULE_PASS("lowertypetests", LowerTypeTestsPass())
MODULE_PASS("fatlto-cleanup", FatLtoCleanup())
-MODULE_PASS("pgo-force-function-attrs", PGOForceFunctionAttrsPass(PGOOpt ?
PGOOpt->ColdOptType : PGOOptions::ColdFuncOpt::Default))
+MODULE_PASS("pgo-force-function-attrs",
+PGOForceFunctionAttrsPass(PGOOpt
svkeerthy wrote:
Yeah, will do. Missed the unrelated formatting changes.
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -104,7 +106,10 @@ MODULE_PASS("lower-ifunc", LowerIFuncPass())
MODULE_PASS("simplify-type-tests", SimplifyTypeTestsPass())
MODULE_PASS("lowertypetests", LowerTypeTestsPass())
MODULE_PASS("fatlto-cleanup", FatLtoCleanup())
-MODULE_PASS("pgo-force-function-attrs", PGOForceFunctionAttrsPass(PGOOpt ?
PGOOpt->ColdOptType : PGOOptions::ColdFuncOpt::Default))
+MODULE_PASS("pgo-force-function-attrs",
+PGOForceFunctionAttrsPass(PGOOpt
mtrofin wrote:
can you make the unrelated stylistic changes to this file in a separate patch?
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -259,32 +306,40 @@ Error IR2VecVocabAnalysis::readVocabulary() {
return createFileError(VocabFile, BufOrError.getError());
auto Content = BufOrError.get()->getBuffer();
- json::Path::Root Path("");
+
Expected ParsedVocabValue = json::parse(Content);
if (!ParsedVocabValue)
return ParsedVocabValue.takeError();
- bool Res = json::fromJSON(*ParsedVocabValue, Vocabulary, Path);
- if (!Res)
-return createStringError(errc::illegal_byte_sequence,
- "Unable to parse the vocabulary");
+ ir2vec::Vocab OpcodeVocab, TypeVocab, ArgVocab;
+ unsigned OpcodeDim, TypeDim, ArgDim;
+ if (auto Err = parseVocabSection("Opcodes", *ParsedVocabValue, OpcodeVocab,
mtrofin wrote:
This changes the format, best to also update the doc.
Also, this means the sections must all be present, even if empty, correct?
SGTM, just something worth spelling out.
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -259,32 +306,40 @@ Error IR2VecVocabAnalysis::readVocabulary() {
return createFileError(VocabFile, BufOrError.getError());
auto Content = BufOrError.get()->getBuffer();
- json::Path::Root Path("");
+
Expected ParsedVocabValue = json::parse(Content);
if (!ParsedVocabValue)
return ParsedVocabValue.takeError();
- bool Res = json::fromJSON(*ParsedVocabValue, Vocabulary, Path);
- if (!Res)
-return createStringError(errc::illegal_byte_sequence,
- "Unable to parse the vocabulary");
+ ir2vec::Vocab OpcodeVocab, TypeVocab, ArgVocab;
+ unsigned OpcodeDim, TypeDim, ArgDim;
mtrofin wrote:
Initialize at declaration
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
@@ -234,6 +237,8 @@ class IR2VecVocabResult {
class IR2VecVocabAnalysis : public AnalysisInfoMixin {
ir2vec::Vocab Vocabulary;
Error readVocabulary();
+ Error parseVocabSection(const char *Key, const json::Value ParsedVocabValue,
mtrofin wrote:
s/const char*/StringRef
s/const json::Value/const json::Value&
https://github.com/llvm/llvm-project/pull/143986
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
https://github.com/svkeerthy edited https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
https://github.com/svkeerthy commented: @albertcohen - Please have a look. I am not able to add you as reviewer. https://github.com/llvm/llvm-project/pull/143986 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)
llvmbot wrote:
@llvm/pr-subscribers-llvm-analysis
Author: S. VenkataKeerthy (svkeerthy)
Changes
Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that
we can avoid scaling each time while computing embeddings. This PR refactors
the vocabulary to explicitly define 3 sections---Opcodes, Types, and
Arguments---used for computing Embeddings.
(Tracking issue - #141817 ; partly fixes - #141832)
---
Patch is 149.98 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/143986.diff
16 Files Affected:
- (modified) llvm/include/llvm/Analysis/IR2Vec.h (+15-1)
- (modified) llvm/lib/Analysis/IR2Vec.cpp (+102-39)
- (modified) llvm/lib/Analysis/models/seedEmbeddingVocab75D.json (+70-63)
- (modified) llvm/lib/Passes/PassRegistry.def (+24-17)
- (added) llvm/test/Analysis/IR2Vec/Inputs/dummy_2D_vocab.json (+11)
- (modified) llvm/test/Analysis/IR2Vec/Inputs/dummy_3D_vocab.json (+13-5)
- (modified) llvm/test/Analysis/IR2Vec/Inputs/dummy_5D_vocab.json (+15-9)
- (added) llvm/test/Analysis/IR2Vec/Inputs/incorrect_vocab1.json (+11)
- (added) llvm/test/Analysis/IR2Vec/Inputs/incorrect_vocab2.json (+12)
- (added) llvm/test/Analysis/IR2Vec/Inputs/incorrect_vocab3.json (+12)
- (added) llvm/test/Analysis/IR2Vec/Inputs/incorrect_vocab4.json (+16)
- (modified) llvm/test/Analysis/IR2Vec/basic.ll (+13-1)
- (added) llvm/test/Analysis/IR2Vec/dbg-inst.ll (+13)
- (added) llvm/test/Analysis/IR2Vec/unreachable.ll (+42)
- (added) llvm/test/Analysis/IR2Vec/vocab-test.ll (+20)
- (modified) llvm/unittests/Analysis/IR2VecTest.cpp (+6)
``diff
diff --git a/llvm/include/llvm/Analysis/IR2Vec.h
b/llvm/include/llvm/Analysis/IR2Vec.h
index de67955d85d7c..f1aaf4cd2e013 100644
--- a/llvm/include/llvm/Analysis/IR2Vec.h
+++ b/llvm/include/llvm/Analysis/IR2Vec.h
@@ -108,6 +108,7 @@ struct Embedding {
/// Arithmetic operators
Embedding &operator+=(const Embedding &RHS);
Embedding &operator-=(const Embedding &RHS);
+ Embedding &operator*=(double Factor);
/// Adds Src Embedding scaled by Factor with the called Embedding.
/// Called_Embedding += Src * Factor
@@ -116,6 +117,8 @@ struct Embedding {
/// Returns true if the embedding is approximately equal to the RHS embedding
/// within the specified tolerance.
bool approximatelyEquals(const Embedding &RHS, double Tolerance = 1e-6)
const;
+
+ void print(raw_ostream &OS) const;
};
using InstEmbeddingsMap = DenseMap;
@@ -234,6 +237,8 @@ class IR2VecVocabResult {
class IR2VecVocabAnalysis : public AnalysisInfoMixin {
ir2vec::Vocab Vocabulary;
Error readVocabulary();
+ Error parseVocabSection(const char *Key, const json::Value ParsedVocabValue,
+ ir2vec::Vocab &TargetVocab, unsigned &Dim);
void emitError(Error Err, LLVMContext &Ctx);
public:
@@ -249,7 +254,6 @@ class IR2VecVocabAnalysis : public
AnalysisInfoMixin {
/// functions.
class IR2VecPrinterPass : public PassInfoMixin {
raw_ostream &OS;
- void printVector(const ir2vec::Embedding &Vec) const;
public:
explicit IR2VecPrinterPass(raw_ostream &OS) : OS(OS) {}
@@ -257,6 +261,16 @@ class IR2VecPrinterPass : public
PassInfoMixin {
static bool isRequired() { return true; }
};
+/// This pass prints the embeddings in the vocabulary
+class IR2VecVocabPrinterPass : public PassInfoMixin {
+ raw_ostream &OS;
+
+public:
+ explicit IR2VecVocabPrinterPass(raw_ostream &OS) : OS(OS) {}
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
+ static bool isRequired() { return true; }
+};
+
} // namespace llvm
#endif // LLVM_ANALYSIS_IR2VEC_H
diff --git a/llvm/lib/Analysis/IR2Vec.cpp b/llvm/lib/Analysis/IR2Vec.cpp
index fa38c35796a0e..f51d3252d6606 100644
--- a/llvm/lib/Analysis/IR2Vec.cpp
+++ b/llvm/lib/Analysis/IR2Vec.cpp
@@ -85,6 +85,12 @@ Embedding &Embedding::operator-=(const Embedding &RHS) {
return *this;
}
+Embedding &Embedding::operator*=(double Factor) {
+ std::transform(this->begin(), this->end(), this->begin(),
+ [Factor](double Elem) { return Elem * Factor; });
+ return *this;
+}
+
Embedding &Embedding::scaleAndAdd(const Embedding &Src, float Factor) {
assert(this->size() == Src.size() && "Vectors must have the same dimension");
for (size_t Itr = 0; Itr < this->size(); ++Itr)
@@ -101,6 +107,13 @@ bool Embedding::approximatelyEquals(const Embedding &RHS,
return true;
}
+void Embedding::print(raw_ostream &OS) const {
+ OS << " [";
+ for (const auto &Elem : Data)
+OS << " " << format("%.2f", Elem) << " ";
+ OS << "]\n";
+}
+
//
==--===//
// Embedder and its subclasses
//===--===//
@@ -196,18 +209,12 @@ void SymbolicEmbedder::computeEmbeddings(const BasicBlock
&BB) const {
for (const auto &I : BB.instructionsWithoutDebug()) {
Embedding InstVector(Dimension, 0);
-const
