[issue46020] Optimize long_pow for the common case

2022-01-12 Thread Tim Peters

[issue46020] Optimize long_pow for the common case

2022-01-11 Thread Tim Peters
Tim Peters added the comment: GH_30555 helps a bit by leaving the giant-exponent table of small odd powers as uninitialized stack trash unless it's actually used. -- ___ Python tracker _

[issue46020] Optimize long_pow for the common case

2022-01-11 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +28756 stage: -> patch review pull_request: https://github.com/python/cpython/pull/30555 ___ Python tracker ___ _

[issue46020] Optimize long_pow for the common case

2022-01-03 Thread Tim Peters
Tim Peters added the comment: I was suprised that https://bugs.python.org/issue44376 managed to get i**2 to within a factor of 2 of i*i's speed. The overheads of running long_pow() at all are high! Don't overlook that initialization of stack variables at the start, like PyLongObject *

[issue46020] Optimize long_pow for the common case

2022-01-02 Thread Mark Dickinson
Change by Mark Dickinson : -- nosy: +tim.peters ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.

[issue46020] Optimize long_pow for the common case

2021-12-09 Thread Ken Jin
Ken Jin added the comment: I'm not sure about the original 10:1 difference in 3.10, but in 3.11, the 2:1 difference might be due to the PEP 659 machinery optimizing for int * int, and float * float cases (see BINARY_OP_MULTIPLY_INT and BINARY_OP_MULTIPLY_FLOAT in ceval). Last I recall, thes

[issue46020] Optimize long_pow for the common case

2021-12-09 Thread Raymond Hettinger
Raymond Hettinger added the comment: The situation for floats is also disappointing: $ python3.11 -m timeit -s 'x=1.1' 'x ** 2' 500 loops, best of 5: 60.8 nsec per loop $ python3.11 -m timeit -s 'x=1.1' 'x ** 2.0' 500 loops, best of 5: 51.5 nsec per loop $ python3.11 -m timeit -s 'x=1.

[issue46020] Optimize long_pow for the common case

2021-12-09 Thread Raymond Hettinger
Raymond Hettinger added the comment: Hmm, I had just looked at that code and it wasn't at all obvious that an optimization had been added. I expected something like: if (exp==2) return PyNumber_Multiply(x, x); I wonder where the extra clock cycles are going. Looking at the ceval.c disp

[issue46020] Optimize long_pow for the common case

2021-12-09 Thread Mark Dickinson
Change by Mark Dickinson : -- nosy: +mark.dickinson ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://m

[issue46020] Optimize long_pow for the common case

2021-12-09 Thread Dennis Sweeney
Dennis Sweeney added the comment: I believe https://bugs.python.org/issue44376 added a special case for 2nd and 3rd powers, and that's the 3.10/3.11 difference in the speed of x**2, not ceval optimizations. -- nosy: +Dennis Sweeney ___ Python trac

[issue46020] Optimize long_pow for the common case

2021-12-08 Thread Raymond Hettinger
New submission from Raymond Hettinger : The expression 'x * x' is faster than 'x ** 2'. In Python3.10, the speed difference was enormous. Due to ceval optimizations, the difference in Python3.11 is tighter; however, there is still room for improvement. The code for long_pow() doesn't curren