maartenbreddels opened a new pull request #7434: URL: https://github.com/apache/arrow/pull/7434
Following up on #7418 I tried and benchmarked a different way for * ascii_lower * ascii_upper Before (lower is similar): ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 4922843 ns 4918961 ns 10 bytes_per_second=3.1457G/s items_per_second=213.17M/s ``` After: ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 1391272 ns 1390014 ns 10 bytes_per_second=11.132G/s items_per_second=754.363M/s ``` This is a 3.7x speedup (on a AMD machine). Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2. Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode Standard Core Specification v13.0). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org