[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
https://github.com/Thibault-Monnier closed https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
Thibault-Monnier wrote: Closing this PR as it doesn't seem to make any big difference in cycles count anyway. https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
nikic wrote: > Update: it seems to be worse after all, which is unexpected, since it > achieved `~1.5%` faster lexing locally. > > The reason I opened this PR in the first place is because the cycles count > was reduced in almost all averages on > [llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/?config=Overview&stat=cycles&remote=Thibault-Monnier) > (see `cf26169083` commit overview). However, I do admit that the results > look quite noisy, so I'll wait for someone with more experience to give their > opinion. > > EDIT: Seeing other experiments I made, it seems the `cycles` metric is too > noisy for any meaningful comparison anyway. Yes, that's why there is the red warning at the top, though it's only present on the commit-to-commit comparison page: > Warning: The cycles metric is very noisy and likely not meaningful for > comparisons between specific revisions. Statistically significant changes are indicated with red/green coloring. https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
Thibault-Monnier wrote: Update: it seems to be worse after all, which is unexpected, since it achieved `~1.5%` faster lexing locally. The reason I opened this PR in the first place is because the cycles count was reduced in almost all averages on [llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/?config=Overview&stat=cycles&remote=Thibault-Monnier) (see `cf26169083` commit overview). However, I do admit that the results look quite noisy, so I'll wait for someone with more experience to give their opinion. https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
Thibault-Monnier wrote: I seem to have achieved better performance by removing InfoTable completely. I'm looking into it, and I'll link the PR if it's better. https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
cor3ntin wrote: > As far as the changes go, I prefer the original code because it was more > clear what was going on. So unless this measurably improves performance, I'm > not certain it's a direction we should head (though @cor3ntin is the code > owner there, so maybe he feels differently). I do agree with you - trying to improve performance of `CharInfo` is tantalizing because it directly impacts lexing performance. However, I was never able to convince myself that changes such as the one proposed here offer improvements that are not in the noise - I fully admit that this is in part because I'm not an expert in profiling. But I do expect the table to always stay in cache during translation (exactly because how frequently it's used), so intuitively i would expect all of these things to be somewhat equivalent. But yeah maybe @nikic or other backend people would have a better feel for what is the most efficient pattern here. https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
AaronBallman wrote: > This change _does_ increase instruction count, but seems to be faster > nonetheless: > [llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=task-clock) > (that's a different commit, but basically the same changes). I'd like to understand this a bit more because as best I can tell, this looks to regress performance. CC @nikic for more opinions on the perf side of things. As far as the changes go, I prefer the original code because it was more clear what was going on. So unless this measurably improves performance, I'm not certain it's a direction we should head (though @cor3ntin is the code owner there, so maybe he feels differently). https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
https://github.com/Thibault-Monnier edited https://github.com/llvm/llvm-project/pull/171052 ___ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: Thibault Monnier (Thibault-Monnier)
Changes
This PR optimizes some of the utilities in `Charinfo.h` by replacing lookup
table checks with a simple bounds or mask check when possible. This reduces
instruction latency, allowing for a faster compilation overall.
This change _does_ increase instruction count, but seems to be faster
nonetheless:
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=cycles)
(that's a different commit, but basically the same changes).
@cor3ntin @AaronBallman
---
Full diff: https://github.com/llvm/llvm-project/pull/171052.diff
1 Files Affected:
- (modified) clang/include/clang/Basic/CharInfo.h (+9-16)
``diff
diff --git a/clang/include/clang/Basic/CharInfo.h
b/clang/include/clang/Basic/CharInfo.h
index 87626eeb8a700..c34bcf4fbf88e 100644
--- a/clang/include/clang/Basic/CharInfo.h
+++ b/clang/include/clang/Basic/CharInfo.h
@@ -89,16 +89,15 @@ LLVM_READONLY inline bool
isAsciiIdentifierContinue(unsigned char c,
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isHorizontalWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_SPACE)) != 0;
+ constexpr unsigned long long Mask = 0b100011010;
+ return (c <= 32) && (Mask >> c) & 1;
}
/// Returns true if this character is vertical ASCII whitespace: '\\n', '\\r'.
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isVerticalWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_VERT_WS) != 0;
+ return c == '\n' || c == '\r';
}
/// Return true if this character is horizontal or vertical ASCII whitespace:
@@ -106,26 +105,23 @@ LLVM_READONLY inline bool isVerticalWhitespace(unsigned
char c) {
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_VERT_WS|CHAR_SPACE)) != 0;
+ constexpr unsigned long long Mask = 0b10010;
+ return (c <= 32) && (Mask >> c) & 1;
}
/// Return true if this character is an ASCII digit: [0-9]
LLVM_READONLY inline bool isDigit(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_DIGIT) != 0;
+ return c >= '0' && c <= '9';
}
/// Return true if this character is a lowercase ASCII letter: [a-z]
LLVM_READONLY inline bool isLowercase(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_LOWER) != 0;
+ return c >= 'a' && c <= 'z';
}
/// Return true if this character is an uppercase ASCII letter: [A-Z]
LLVM_READONLY inline bool isUppercase(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_UPPER) != 0;
+ return c >= 'A' && c <= 'Z';
}
/// Return true if this character is an ASCII letter: [a-zA-Z]
@@ -158,9 +154,7 @@ LLVM_READONLY inline bool isPunctuation(unsigned char c) {
/// character that should take exactly one column to print in a fixed-width
/// terminal.
LLVM_READONLY inline bool isPrintable(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_UPPER | CHAR_LOWER | CHAR_PERIOD | CHAR_PUNCT |
- CHAR_DIGIT | CHAR_UNDER | CHAR_SPACE)) != 0;
+ return c >= 32 && c <= 126;
}
/// Return true if this is the body character of a C preprocessing number,
@@ -236,7 +230,6 @@ LLVM_READONLY inline char toUppercase(char c) {
return c;
}
-
/// Return true if this is a valid ASCII identifier.
///
/// Note that this is a very simple check; it does not accept UCNs as valid
``
https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)
https://github.com/Thibault-Monnier created
https://github.com/llvm/llvm-project/pull/171052
This PR optimizes some of the utilities in `Charinfo.h` by replacing lookup
table checks with a simple bounds or mask check when possible. This reduces
instruction latency, allowing for a faster compilation overall.
This change _does_ increase instruction count, but seems to be faster
nonetheless:
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=cycles)
(that's a different commit, but basically the same changes).
@cor3ntin @AaronBallman
>From d7e1102d5985cfbcc5427c8aa1a3dd8127951f26 Mon Sep 17 00:00:00 2001
From: Thibault-Monnier
Date: Sun, 7 Dec 2025 15:37:43 +0100
Subject: [PATCH] Optimize CharInfo.h utilities
---
clang/include/clang/Basic/CharInfo.h | 25 +
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/clang/include/clang/Basic/CharInfo.h
b/clang/include/clang/Basic/CharInfo.h
index 87626eeb8a700..c34bcf4fbf88e 100644
--- a/clang/include/clang/Basic/CharInfo.h
+++ b/clang/include/clang/Basic/CharInfo.h
@@ -89,16 +89,15 @@ LLVM_READONLY inline bool
isAsciiIdentifierContinue(unsigned char c,
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isHorizontalWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_SPACE)) != 0;
+ constexpr unsigned long long Mask = 0b100011010;
+ return (c <= 32) && (Mask >> c) & 1;
}
/// Returns true if this character is vertical ASCII whitespace: '\\n', '\\r'.
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isVerticalWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_VERT_WS) != 0;
+ return c == '\n' || c == '\r';
}
/// Return true if this character is horizontal or vertical ASCII whitespace:
@@ -106,26 +105,23 @@ LLVM_READONLY inline bool isVerticalWhitespace(unsigned
char c) {
///
/// Note that this returns false for '\\0'.
LLVM_READONLY inline bool isWhitespace(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_VERT_WS|CHAR_SPACE)) != 0;
+ constexpr unsigned long long Mask = 0b10010;
+ return (c <= 32) && (Mask >> c) & 1;
}
/// Return true if this character is an ASCII digit: [0-9]
LLVM_READONLY inline bool isDigit(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_DIGIT) != 0;
+ return c >= '0' && c <= '9';
}
/// Return true if this character is a lowercase ASCII letter: [a-z]
LLVM_READONLY inline bool isLowercase(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_LOWER) != 0;
+ return c >= 'a' && c <= 'z';
}
/// Return true if this character is an uppercase ASCII letter: [A-Z]
LLVM_READONLY inline bool isUppercase(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & CHAR_UPPER) != 0;
+ return c >= 'A' && c <= 'Z';
}
/// Return true if this character is an ASCII letter: [a-zA-Z]
@@ -158,9 +154,7 @@ LLVM_READONLY inline bool isPunctuation(unsigned char c) {
/// character that should take exactly one column to print in a fixed-width
/// terminal.
LLVM_READONLY inline bool isPrintable(unsigned char c) {
- using namespace charinfo;
- return (InfoTable[c] & (CHAR_UPPER | CHAR_LOWER | CHAR_PERIOD | CHAR_PUNCT |
- CHAR_DIGIT | CHAR_UNDER | CHAR_SPACE)) != 0;
+ return c >= 32 && c <= 126;
}
/// Return true if this is the body character of a C preprocessing number,
@@ -236,7 +230,6 @@ LLVM_READONLY inline char toUppercase(char c) {
return c;
}
-
/// Return true if this is a valid ASCII identifier.
///
/// Note that this is a very simple check; it does not accept UCNs as valid
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
