[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2026-01-08 Thread Thibault Monnier via cfe-commits

https://github.com/Thibault-Monnier closed 
https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2026-01-08 Thread Thibault Monnier via cfe-commits

Thibault-Monnier wrote:

Closing this PR as it doesn't seem to make any big difference in cycles count 
anyway.

https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2026-01-07 Thread Nikita Popov via cfe-commits

nikic wrote:

> Update: it seems to be worse after all, which is unexpected, since it 
> achieved `~1.5%` faster lexing locally.
> 
> The reason I opened this PR in the first place is because the cycles count 
> was reduced in almost all averages on 
> [llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/?config=Overview&stat=cycles&remote=Thibault-Monnier)
>  (see `cf26169083` commit overview). However, I do admit that the results 
> look quite noisy, so I'll wait for someone with more experience to give their 
> opinion.
> 
> EDIT: Seeing other experiments I made, it seems the `cycles` metric is too 
> noisy for any meaningful comparison anyway.

Yes, that's why there is the red warning at the top, though it's only present 
on the commit-to-commit comparison page:

> Warning: The cycles metric is very noisy and likely not meaningful for 
> comparisons between specific revisions.

Statistically significant changes are indicated with red/green coloring.

https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-10 Thread Thibault Monnier via cfe-commits

Thibault-Monnier wrote:

Update: it seems to be worse after all, which is unexpected, since it achieved 
`~1.5%` faster lexing locally.

The reason I opened this PR in the first place is because the cycles count was 
reduced in almost all averages on 
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/?config=Overview&stat=cycles&remote=Thibault-Monnier)
 (see `cf26169083` commit overview). However, I do admit that the results look 
quite noisy, so I'll wait for someone with more experience to give their 
opinion.

https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-09 Thread Thibault Monnier via cfe-commits

Thibault-Monnier wrote:

I seem to have achieved better performance by removing InfoTable completely. 
I'm looking into it, and I'll link the PR if it's better.

https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-09 Thread Corentin Jabot via cfe-commits

cor3ntin wrote:

> As far as the changes go, I prefer the original code because it was more 
> clear what was going on. So unless this measurably improves performance, I'm 
> not certain it's a direction we should head (though @cor3ntin is the code 
> owner there, so maybe he feels differently).

I do agree with you - trying to improve performance of `CharInfo` is 
tantalizing because it directly impacts lexing performance.
However, I was never able to convince myself that changes such as the one 
proposed here offer improvements that are not in the noise - I fully admit that 
this is in part because I'm not an expert in profiling.

But I do expect the table to always stay in cache during translation (exactly 
because how frequently it's used), so intuitively
i would expect all of these things to be somewhat equivalent. 

But yeah maybe @nikic or other backend people would have a better feel for what 
is the most efficient pattern here.


https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-08 Thread Aaron Ballman via cfe-commits

AaronBallman wrote:

> This change _does_ increase instruction count, but seems to be faster 
> nonetheless: 
> [llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=task-clock)
>  (that's a different commit, but basically the same changes).

I'd like to understand this a bit more because as best I can tell, this looks 
to regress performance.  CC @nikic for more opinions on the perf side of things.

As far as the changes go, I prefer the original code because it was more clear 
what was going on. So unless this measurably improves performance, I'm not 
certain it's a direction we should head (though @cor3ntin is the code owner 
there, so maybe he feels differently).

https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-07 Thread Thibault Monnier via cfe-commits

https://github.com/Thibault-Monnier edited 
https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-07 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Thibault Monnier (Thibault-Monnier)


Changes

This PR optimizes some of the utilities in `Charinfo.h` by replacing lookup 
table checks with a simple bounds or mask check when possible. This reduces 
instruction latency, allowing for a faster compilation overall.

This change _does_ increase instruction count, but seems to be faster 
nonetheless: 
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=cycles)
 (that's a different commit, but basically the same changes).

@cor3ntin @AaronBallman

---
Full diff: https://github.com/llvm/llvm-project/pull/171052.diff


1 Files Affected:

- (modified) clang/include/clang/Basic/CharInfo.h (+9-16) 


``diff
diff --git a/clang/include/clang/Basic/CharInfo.h 
b/clang/include/clang/Basic/CharInfo.h
index 87626eeb8a700..c34bcf4fbf88e 100644
--- a/clang/include/clang/Basic/CharInfo.h
+++ b/clang/include/clang/Basic/CharInfo.h
@@ -89,16 +89,15 @@ LLVM_READONLY inline bool 
isAsciiIdentifierContinue(unsigned char c,
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isHorizontalWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_SPACE)) != 0;
+  constexpr unsigned long long Mask = 0b100011010;
+  return (c <= 32) && (Mask >> c) & 1;
 }
 
 /// Returns true if this character is vertical ASCII whitespace: '\\n', '\\r'.
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isVerticalWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_VERT_WS) != 0;
+  return c == '\n' || c == '\r';
 }
 
 /// Return true if this character is horizontal or vertical ASCII whitespace:
@@ -106,26 +105,23 @@ LLVM_READONLY inline bool isVerticalWhitespace(unsigned 
char c) {
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_VERT_WS|CHAR_SPACE)) != 0;
+  constexpr unsigned long long Mask = 0b10010;
+  return (c <= 32) && (Mask >> c) & 1;
 }
 
 /// Return true if this character is an ASCII digit: [0-9]
 LLVM_READONLY inline bool isDigit(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_DIGIT) != 0;
+  return c >= '0' && c <= '9';
 }
 
 /// Return true if this character is a lowercase ASCII letter: [a-z]
 LLVM_READONLY inline bool isLowercase(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_LOWER) != 0;
+  return c >= 'a' && c <= 'z';
 }
 
 /// Return true if this character is an uppercase ASCII letter: [A-Z]
 LLVM_READONLY inline bool isUppercase(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_UPPER) != 0;
+  return c >= 'A' && c <= 'Z';
 }
 
 /// Return true if this character is an ASCII letter: [a-zA-Z]
@@ -158,9 +154,7 @@ LLVM_READONLY inline bool isPunctuation(unsigned char c) {
 /// character that should take exactly one column to print in a fixed-width
 /// terminal.
 LLVM_READONLY inline bool isPrintable(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_UPPER | CHAR_LOWER | CHAR_PERIOD | CHAR_PUNCT |
-  CHAR_DIGIT | CHAR_UNDER | CHAR_SPACE)) != 0;
+  return c >= 32 && c <= 126;
 }
 
 /// Return true if this is the body character of a C preprocessing number,
@@ -236,7 +230,6 @@ LLVM_READONLY inline char toUppercase(char c) {
   return c;
 }
 
-
 /// Return true if this is a valid ASCII identifier.
 ///
 /// Note that this is a very simple check; it does not accept UCNs as valid

``




https://github.com/llvm/llvm-project/pull/171052
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [Lexer] Optimize CharInfo.h utilities (PR #171052)

2025-12-07 Thread Thibault Monnier via cfe-commits

https://github.com/Thibault-Monnier created 
https://github.com/llvm/llvm-project/pull/171052

This PR optimizes some of the utilities in `Charinfo.h` by replacing lookup 
table checks with a simple bounds or mask check when possible. This reduces 
instruction latency, allowing for a faster compilation overall.

This change _does_ increase instruction count, but seems to be faster 
nonetheless: 
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=93e18db3e48dc28818d4880e813b9027bfbf3c16&to=cf26169083f2a68766df65607bec1547d3079aad&stat=cycles)
 (that's a different commit, but basically the same changes).

@cor3ntin @AaronBallman

>From d7e1102d5985cfbcc5427c8aa1a3dd8127951f26 Mon Sep 17 00:00:00 2001
From: Thibault-Monnier 
Date: Sun, 7 Dec 2025 15:37:43 +0100
Subject: [PATCH] Optimize CharInfo.h utilities

---
 clang/include/clang/Basic/CharInfo.h | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/clang/include/clang/Basic/CharInfo.h 
b/clang/include/clang/Basic/CharInfo.h
index 87626eeb8a700..c34bcf4fbf88e 100644
--- a/clang/include/clang/Basic/CharInfo.h
+++ b/clang/include/clang/Basic/CharInfo.h
@@ -89,16 +89,15 @@ LLVM_READONLY inline bool 
isAsciiIdentifierContinue(unsigned char c,
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isHorizontalWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_SPACE)) != 0;
+  constexpr unsigned long long Mask = 0b100011010;
+  return (c <= 32) && (Mask >> c) & 1;
 }
 
 /// Returns true if this character is vertical ASCII whitespace: '\\n', '\\r'.
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isVerticalWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_VERT_WS) != 0;
+  return c == '\n' || c == '\r';
 }
 
 /// Return true if this character is horizontal or vertical ASCII whitespace:
@@ -106,26 +105,23 @@ LLVM_READONLY inline bool isVerticalWhitespace(unsigned 
char c) {
 ///
 /// Note that this returns false for '\\0'.
 LLVM_READONLY inline bool isWhitespace(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_HORZ_WS|CHAR_VERT_WS|CHAR_SPACE)) != 0;
+  constexpr unsigned long long Mask = 0b10010;
+  return (c <= 32) && (Mask >> c) & 1;
 }
 
 /// Return true if this character is an ASCII digit: [0-9]
 LLVM_READONLY inline bool isDigit(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_DIGIT) != 0;
+  return c >= '0' && c <= '9';
 }
 
 /// Return true if this character is a lowercase ASCII letter: [a-z]
 LLVM_READONLY inline bool isLowercase(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_LOWER) != 0;
+  return c >= 'a' && c <= 'z';
 }
 
 /// Return true if this character is an uppercase ASCII letter: [A-Z]
 LLVM_READONLY inline bool isUppercase(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & CHAR_UPPER) != 0;
+  return c >= 'A' && c <= 'Z';
 }
 
 /// Return true if this character is an ASCII letter: [a-zA-Z]
@@ -158,9 +154,7 @@ LLVM_READONLY inline bool isPunctuation(unsigned char c) {
 /// character that should take exactly one column to print in a fixed-width
 /// terminal.
 LLVM_READONLY inline bool isPrintable(unsigned char c) {
-  using namespace charinfo;
-  return (InfoTable[c] & (CHAR_UPPER | CHAR_LOWER | CHAR_PERIOD | CHAR_PUNCT |
-  CHAR_DIGIT | CHAR_UNDER | CHAR_SPACE)) != 0;
+  return c >= 32 && c <= 126;
 }
 
 /// Return true if this is the body character of a C preprocessing number,
@@ -236,7 +230,6 @@ LLVM_READONLY inline char toUppercase(char c) {
   return c;
 }
 
-
 /// Return true if this is a valid ASCII identifier.
 ///
 /// Note that this is a very simple check; it does not accept UCNs as valid

___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits