[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Pablo Galindo Salgado
Change by Pablo Galindo Salgado : -- nosy: +pablogsal nosy_count: 4.0 -> 5.0 pull_requests: +23132 pull_request: https://github.com/python/cpython/pull/24313 ___ Python tracker

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: re.compile() already uses caching. But it is less efficient for some reasons. To Steven: the time is *reduced* by 28%, but the speed is *increased* by 39%. -- nosy: +serhiy.storchaka ___ Python tracker

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Steven D'Aprano
Steven D'Aprano added the comment: Just for the record: > The optimization takes the execution from ~6300ms to ~4500ms on my machine > (representing a 28% - 39% improvement depending on how you calculate it) The correct answer is 28%, which uses the initial value as the base:

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Batuhan Taskaya
Change by Batuhan Taskaya : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Batuhan Taskaya
Batuhan Taskaya added the comment: New changeset 15bd9efd01e44087664e78bf766865a6d2e06626 by Anthony Sottile in branch 'master': bpo-43014: Improve performance of tokenize.tokenize by 20-30% https://github.com/python/cpython/commit/15bd9efd01e44087664e78bf766865a6d2e06626 -- nosy:

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Anthony Sottile added the comment: attached out3.pstats / out3.svg which represent the optimization using lru_cache instead -- Added file: https://bugs.python.org/file49764/out3.svg ___ Python tracker

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Change by Anthony Sottile : Added file: https://bugs.python.org/file49763/out3.pstats ___ Python tracker ___ ___ Python-bugs-list mailing

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Anthony Sottile added the comment: admittedly anecdotal but here's another data point in addition to the profiles attached test.test_tokenize suite before: $ ./python -m test.test_tokenize ..

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Change by Anthony Sottile : -- keywords: +patch pull_requests: +23130 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24311 ___ Python tracker ___

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Change by Anthony Sottile : Added file: https://bugs.python.org/file49762/out2.svg ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Change by Anthony Sottile : Added file: https://bugs.python.org/file49761/out2.pstats ___ Python tracker ___ ___ Python-bugs-list mailing

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
Change by Anthony Sottile : Added file: https://bugs.python.org/file49760/out.svg ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue43014] tokenize spends a lot of time in `re.compile(...)`

2021-01-24 Thread Anthony Sottile
New submission from Anthony Sottile : I did some profiling (attached a few files here with svgs) of running this script: ```python import io import tokenize # picked as the second longest file in cpython with open('Lib/test/test_socket.py', 'rb') as f: bio = io.BytesIO(f.read()) def