maartenbreddels opened a new pull request #7434:
URL: https://github.com/apache/arrow/pull/7434


   Following up on #7418 I tried and benchmarked a different way for
    * ascii_lower
    * ascii_upper
   
   Before (lower is similar):
   ```
   --------------------------------------------------
   Benchmark           Time           CPU Iterations
   --------------------------------------------------
   AsciiUpper_median    4922843 ns      4918961 ns           10 
bytes_per_second=3.1457G/s items_per_second=213.17M/s
   ```
   
   After:
   ```
   --------------------------------------------------
   Benchmark           Time           CPU Iterations
   --------------------------------------------------
   AsciiUpper_median    1391272 ns      1390014 ns           10 
bytes_per_second=11.132G/s items_per_second=754.363M/s
   
   ```
   
   This is a 3.7x speedup (on a AMD machine).
   
   Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x 
speedup for clang 9, 6.4x for GCC 9.2.
   
   Also, the test is expanded a bit to include a non-ascii codepoint, to make 
explicit it is fine to upper
   or lower case a utf8 string. The non-overlap encoding of utf8 make this ok 
(see section 2.5 of Unicode
   Standard Core Specification v13.0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to