[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini
Marco Paolini added the comment: also worth noting escape sequences for non-ascii characters are slower, even when encoded length is the same. python -m pyperf timeit -s 'import json;' -s 'c = "€"; s = json.dumps(c * (2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o nonascii2k.json

[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini
Marco Paolini added the comment: I also confirm Inada's patch further improves performance! All my previous benchmarks were done with gcc and PGO optimizations performed only with test_json task... maybe this explains the weird results? I tested the performance of new master

[issue37810] ndiff reports incorrect location when diff strings contain tabs

2019-08-11 Thread Anthony Sottile
Anthony Sottile added the comment: That's actually a good point, I don't think this should land in python3.7 since it changes outuput -- I'm removing that from the versions (though the bug does affect every version I have access to) -- versions: -Python 3.7

[issue37496] Support annotations in signature strings.

2019-08-11 Thread hai shi
Change by hai shi : -- nosy: +shihai1991 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37810] ndiff reports incorrect location when diff strings contain tabs

2019-08-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: This seems like a reasonable suggestion to me. Am not sure whether it should be backported. Tim, what do you think? -- assignee: -> tim.peters components: +Library (Lib) nosy: +rhettinger, tim.peters ___

[issue37810] ndiff reports incorrect location when diff strings contain tabs

2019-08-10 Thread Anthony Sottile
Change by Anthony Sottile : -- keywords: +patch pull_requests: +14930 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15201 ___ Python tracker ___

[issue37810] ndiff reports incorrect location when diff strings contain tabs

2019-08-10 Thread Anthony Sottile
strings contain tabs versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue37810> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37587] JSON loads performance improvement for long strings

2019-08-08 Thread Inada Naoki
Inada Naoki added the comment: New changeset 2a570af12ac5e4ac5575a68f8739b31c24d01367 by Inada Naoki in branch 'master': bpo-37587: optimize json.loads (GH-15134) https://github.com/python/cpython/commit/2a570af12ac5e4ac5575a68f8739b31c24d01367 --

[issue37496] Support annotations in signature strings.

2019-08-07 Thread Guido van Rossum
Guido van Rossum added the comment: > 1. Should I create a new function in the `ast` module that exposes that C > function in Python in order to use it in `Lib/inspect.py`? > 2. Would it be better to just re-use the _AST to string_ implementation in > `Tools/unparse.py`? I would vote for

[issue37789] Update doc strings for test.bytecode_helper

2019-08-07 Thread Joannah Nanjekye
Joannah Nanjekye added the comment: *supposed not supported. -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue37789] Update doc strings for test.bytecode_helper

2019-08-07 Thread Joannah Nanjekye
Change by Joannah Nanjekye : -- assignee: -> docs@python components: +Documentation nosy: +docs@python versions: +Python 3.9 ___ Python tracker ___

[issue37789] Update doc strings for test.bytecode_helper

2019-08-07 Thread Joannah Nanjekye
New submission from Joannah Nanjekye : I want to believe there is a mistake in the doc strings for these methods: def assertInBytecode(self, x, opname, argval=_UNSPECIFIED): """Returns instr if op is found, otherwise throws AssertionError""" for i

[issue37496] Support annotations in signature strings.

2019-08-07 Thread Ivan Levkivskyi
Ivan Levkivskyi added the comment: > I have a couple of questions about how to use `ast_unparse.c`'s > `expr_as_unicode` function: > 1. Should I create a new function in the `ast` module that exposes that C > function in Python in order to use it in `Lib/inspect.py`? > 2. Would it be better

[issue36757] uuid constructor accept invalid strings (extra dash)

2019-08-05 Thread Tal Einat
Tal Einat added the comment: I too find this surprising, especially given how thoroughly UUID validates inputs of types other than "hex". The documentation simply states that for hex input, hypens, curly braces and a URN prefix are optional. In practice, though, it is much more lenient

[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki
Inada Naoki added the comment: And I confirmed performance improvement by my patch (GH-15134) on all of 4 compilers. $ ./python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" "json.loads(x)" old: 9211e2 new: 8a758f opt2: 284e47 gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0

[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +14873 pull_request: https://github.com/python/cpython/pull/15134 ___ Python tracker ___

[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki
Inada Naoki added the comment: I tried without PGO and confirmed performance improved on GCC 7.2.0. No change on other compiler versions. $ ./python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" "json.loads(x)" old: 9211e2 new: 8a758f gcc (Ubuntu 8.3.0-6ubuntu1)

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki
Inada Naoki added the comment: This issue is very compiler sensitive. Please don't report performance without compiler version and PGO option. Now I'm facing strange behavior. pyperf reports slower time (1ms) for PGO builds, although disasm looks good. But it's 2:30am already... Please wait

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki
Inada Naoki added the comment: I'm sorry, I was wrong. PGO did very nice job on all cases. gcc allocates `c` to register in the hot loop. -- ___ Python tracker ___

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki
Inada Naoki added the comment: I tested before, after, Steve's patch, and my patch with gcc 8.3.0 and PGO+LTO. https://gist.github.com/methane/f6077bd1b0b04d40a9c790d9ed670a44#file-gcc-8-3-0-pgo-md Surprisingly, there is no difference. Even my patch didn't help register allocation when PGO

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower
Steve Dower added the comment: > compiler stores the `c` to stack every time The disassembly we looked at didn't do this, so it may just be certain compilers. Perhaps we can actually use the register keyword to help them out? :) Here's a slightly altered one that doesn't require rescanning

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki
Inada Naoki added the comment: Since scope of "c" is very wide, and there is even `` in the scope, compiler stores the `c` to stack every time on: c = PyUnicode_READ(kind, buf, next); That is the bottleneck. `if (strict && ...)` is not the bottleneck. My patch used a new variable with

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Marco Paolini
Marco Paolini added the comment: @steve.dower yes, that's what made me discard that experiment we did during the sprint. Ok will test your new patch soon -- ___ Python tracker

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower
Steve Dower added the comment: Oh, we also need to capture "next"... but then again, since the success case is far more common, I'd be okay with scanning again to find it. -- ___ Python tracker

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower
Steve Dower added the comment: While you're testing patches, can you try this version too? Py_UCS4 c = 0, minc = 0x20; for (next = end; next < len; next++) { c = PyUnicode_READ(kind, buf, next); if (c == '"' || c == '\\') { break;

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki
Inada Naoki added the comment: Wait... there is no benchmark for the "minimum change". I tested 4 compilers, and provide much better patch in https://bugs.python.org/issue37587#msg348114 -- ___ Python tracker

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread miss-islington
miss-islington added the comment: New changeset 9265a877426af4fa5c44cc8482e0198806889350 by Miss Islington (bot) in branch '3.8': bpo-37587: Make json.loads faster for long strings (GH-14752) https://github.com/python/cpython/commit/9265a877426af4fa5c44cc8482e0198806889350 -- nosy

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Nick Coghlan
Nick Coghlan added the comment: I went ahead and merged the minimal PR and flagged it for backporting to 3.8 - it's an obviously beneficial change, that clearly does less work on each pass through the loop. Even if you are doing non-strict parsing of a string that consists entirely of

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread miss-islington
Change by miss-islington : -- pull_requests: +14783 pull_request: https://github.com/python/cpython/pull/15022 ___ Python tracker ___

[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Nick Coghlan
Nick Coghlan added the comment: New changeset 8a758f5b99c5fc3fd32edeac049d7d4a4b7cc163 by Nick Coghlan (Marco Paolini) in branch 'master': bpo-37587: Make json.loads faster for long strings (GH-14752) https://github.com/python/cpython/commit/8a758f5b99c5fc3fd32edeac049d7d4a4b7cc163

[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini
Marco Paolini added the comment: I forgot to mention, I was inspired by @christian.heimes 's talk at EuroPython 2019 https://ep2019.europython.eu/talks/es2pZ6C-introduction-to-low-level-profiling-and-tracing/ (thanks!) -- ___ Python tracker

[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini
Marco Paolini added the comment: I am also working on a different patch that uses the "pcmpestri" SSE4 processor instruction, it looks like this for now. While at it I realized there is (maybe) another potential speedup: avoiding the ucs4lib_find_max_char we do for each chunk of the string

[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini
Marco Paolini added the comment: On gcc, running the tests above, the only change that is relevant for speedup is switching around the strict check. Removing the extra MOV related to the outer "c" variable is not significant (at least on gcc and the few tests I did) Unfortunately I had to

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-29 Thread Brett Cannon
Brett Cannon added the comment: Changing the semantics of os.path.isdir() for something like this isn't worth breaking code; basically it's now a quirk of the function. -- ___ Python tracker

[issue37496] Support annotations in signature strings.

2019-07-29 Thread Joannah Nanjekye
Change by Joannah Nanjekye : -- nosy: -nanjekyejoannah ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue6114] distutils build_ext path comparison only based on strings

2019-07-29 Thread STINNER Victor
STINNER Victor added the comment: This issue is 10 years old has a patch: it's far from being "newcomer friendly", I remove the "Easy" label. -- keywords: -easy nosy: +vstinner ___ Python tracker

[issue1185124] pydoc doesn't find all module doc strings

2019-07-29 Thread STINNER Victor
STINNER Victor added the comment: This issue is 14 years old, inactive for 5 years, has 3 patches: it's far from being "newcomer friendly", I remove the "Easy" label. -- keywords: -easy versions: +Python 3.9 -Python 3.5 ___ Python tracker

[issue37496] Support annotations in signature strings.

2019-07-27 Thread Giovanni Cappellotto
` return Python type classes. At the beginning, in order to keep `Parameter.annotation`'s return type consistent with the current implementation, I tried to evaluate the annotation's "unparse AST" string output, but I was getting errors evaluating type aliases and refined types. Retur

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Kirill Balunov
Kirill Balunov added the comment: I am reading "equivalence" too strictly (like "as a substitute"), because this is part of the documentation :) and I agree that in ordinary speech I would use it rather in the sense of “similar”. In order to make sure, that everyone agrees only on that this

[issue37496] Support annotations in signature strings.

2019-07-26 Thread Eric Snow
Eric Snow added the comment: +1 on using a string for Parameter.annotation and Signature.return_annotation. -- ___ Python tracker ___

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: If "equivalent" is deceiving, perhaps replace it with "similary" or "roughly equivalent". Feel free to post a PR with your preferred wording. -- ___ Python tracker

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Brett Cannon
Brett Cannon added the comment: I think you're reading "equivalence" too strictly here to mean "exactly the same semantics". In this instance it means "for similar functionality, the equivalent method is ..." (admittedly this might be a quirk of the use of the word "equivalent" in North

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Karthikeyan Singaravelan
Change by Karthikeyan Singaravelan : -- nosy: +pitrou ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Kirill Balunov
Kirill Balunov added the comment: I understand the reasons, I only say that it does not correspond to my perception of their equivalence, because: os.path.isdir('') != os.path.isdir('.') while: Path('').is_dir() == Path('.').is_dir() and I can confirm that some libraries rely on

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: It is because Path() == Path('') == Path('.'). -- nosy: +serhiy.storchaka ___ Python tracker ___

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Kirill Balunov
Kirill Balunov added the comment: Forgot to write the result for Path variant: >>> Path(dummy).is_dir() True -- ___ Python tracker ___

[issue37688] The results from os.path.isdir(...) an Path(...).is_dir() are not equivalent for empty path strings.

2019-07-26 Thread Kirill Balunov
New submission from Kirill Balunov : In the documentation it is said that os.path.isdir(...) an Path(...).is_dir()are equivalent substitutes. https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module But they give different result for empty path strings

[issue37587] JSON loads performance improvement for long strings

2019-07-23 Thread Steve Dower
Steve Dower added the comment: Marco has a newer patch with better performance that we came up with at the sprints, but apparently it hasn't been pushed yet. Hopefully he'll get that up soon and we can review it instead - the current PR wasn't as reliably good as initial testing suggested.

[issue37496] Support annotations in signature strings.

2019-07-22 Thread Joannah Nanjekye
Change by Joannah Nanjekye : -- nosy: +nanjekyejoannah ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37647] Wrong lineno in traceback when formatting strings with % and multilines

2019-07-22 Thread Bruno P. Kinoshita
Bruno P. Kinoshita added the comment: Hi, Thanks for the explanation. Could you elaborate a bit on this one: >In Python 3.8 the traceback points to the start of the subexpression that >raises an exception. Just wanted to understand why I get the desired line number when I call

[issue37647] Wrong lineno in traceback when formatting strings with % and multilines

2019-07-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Line 9 is not where the error actually happened. The exception is raised not when you call Z(), but when you implicitly call Z.__str__() when evaluate the % operator. In Python 3.8 the traceback points to the start of the subexpression that raises an

[issue37647] Wrong lineno in traceback when formatting strings with % and multilines

2019-07-22 Thread Bruno P. Kinoshita
Bruno P. Kinoshita added the comment: Hi Ammar, thanks for the quick reply and for the suggestion. Tried on the latest version on master, and looks like it's indeed different, though still looks like it is displaying the wrong line number. ```

[issue37647] Wrong lineno in traceback when formatting strings with % and multilines

2019-07-22 Thread Ammar Askar
Ammar Askar added the comment: Please try this on the latest version of Python, there was a behavior change implemented in issue12458 that might make this a non-issue. -- nosy: +ammar2, serhiy.storchaka ___ Python tracker

[issue37647] Wrong lineno in traceback when formatting strings with % and multilines

2019-07-21 Thread Bruno P. Kinoshita
components: Interpreter Core messages: 348277 nosy: kinow priority: normal severity: normal status: open title: Wrong lineno in traceback when formatting strings with % and multilines type: behavior versions: Python 3.7 ___ P

[issue37496] Support annotations in signature strings.

2019-07-19 Thread Ivan Levkivskyi
Ivan Levkivskyi added the comment: You might want to look into how PEP 563 is implemented, it has a utility to turn an AST back into a string (I assume this is what you want). -- ___ Python tracker

[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki
Change by Inada Naoki : -- versions: -Python 3.6, Python 3.7, Python 3.8 ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue37496] Support annotations in signature strings.

2019-07-18 Thread Giovanni Cappellotto
Giovanni Cappellotto added the comment: I'd like to work on this, but I'm kind of new to the codebase. Do you think I should leave this task to someone more expert on the matter? I took a look at the function you mentioned and I was able to support simple annotations, for instance `x: int`,

[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki
Inada Naoki added the comment: Some compilers produce inefficient code for PR-14752. I wrote another patch which is friendly to more compilers. $ perf record ./python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" "json.loads(x)" # PR-14752 gcc-7 (Ubuntu

[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki
Inada Naoki added the comment: > 1. remove the mov entirely. It is not needed inside the loop and it is only > needed later, outside the loop to access the variable How can we lazy "movDWORD PTR [rsp+0x44],eax"? -- nosy: +inada.naoki ___

[issue37496] Support annotations in signature strings.

2019-07-14 Thread Giovanni Cappellotto
Change by Giovanni Cappellotto : -- nosy: +potomak ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37587] JSON loads performance improvement for long strings

2019-07-14 Thread Steve Dower
Change by Steve Dower : -- nosy: +steve.dower ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini
Marco Paolini added the comment: Here's the real world example $ ls -hs events-100k.json 84M events-100k.json +---+-+-+ | Benchmark | vanilla-bpo-events-100k | patched-bpo-events-100k |

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini
Marco Paolini added the comment: Also on my real workload (loading 60GB jsonl file containing mostly strings) I measured a 10% improvement -- ___ Python tracker <https://bugs.python.org/issue37

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini
Change by Marco Paolini : -- nosy: +ezio.melotti, rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini
Change by Marco Paolini : -- keywords: +patch pull_requests: +14547 stage: -> patch review pull_request: https://github.com/python/cpython/pull/14752 ___ Python tracker ___

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Karthikeyan Singaravelan
Change by Karthikeyan Singaravelan : -- nosy: +serhiy.storchaka ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini
ority: normal severity: normal status: open title: JSON loads performance improvement for long strings type: performance versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48476/events.svg ___ Py

[issue37575] Python Documentation on strings (tutorial section 3.1.2.)

2019-07-13 Thread Mark Dickinson
Change by Mark Dickinson : -- resolution: -> not a bug status: open -> pending type: resource usage -> ___ Python tracker ___ ___

[issue37575] Python Documentation on strings (tutorial section 3.1.2.)

2019-07-12 Thread Mark Dickinson
Change by Mark Dickinson : -- title: Python Documentation on strings ( section 3.1.2.) -> Python Documentation on strings (tutorial section 3.1.2.) ___ Python tracker <https://bugs.python.org/issu

[issue37575] Python Documentation on strings ( section 3.1.2.)

2019-07-12 Thread Mark Dickinson
Mark Dickinson added the comment: The documentation is correct here; none of the examples you show demonstrates implicit concatenation of string-valued expressions. The tutorial documentation is referring to two strings placed directly next to each other with no other syntax (other than

[issue37575] Python Documentation on strings ( section 3.1.2.)

2019-07-12 Thread Srikanth
New submission from Srikanth : In section 3.1.2 of the python documentation, its mentioned as below: Two or more string literals (i.e. the ones enclosed between quotes) next to each other are automatically concatenated. This feature is particularly useful when you want to break long strings

[issue37540] vectorcall: keyword names must be strings

2019-07-10 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- keywords: +patch pull_requests: +14487 stage: -> patch review pull_request: https://github.com/python/cpython/pull/14682 ___ Python tracker ___

[issue37540] vectorcall: keyword names must be strings

2019-07-10 Thread Jeroen Demeyer
New submission from Jeroen Demeyer : Keyword names in calls are expected to be strings, however it's currently not clear who should enforce/check this. I suggest to fix this for vectorcall/METH_FASTCALL and specify that it's the caller's job to make sure that keyword names are strings (str

[issue37496] Support annotations in signature strings.

2019-07-05 Thread Ivan Levkivskyi
Change by Ivan Levkivskyi : -- nosy: +levkivskyi ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37496] Support annotations in signature strings.

2019-07-03 Thread Eric Snow
re_fromstr(). Annotations should be supported. I'd expect that PEP 563 (Postponed Evaluation of Annotations) could be leveraged. -- components: Library (Lib) messages: 347247 nosy: eric.snow priority: normal severity: normal stage: needs patch status: open title: Support annotations in signatu

[issue37348] Optimize PyUnicode_FromString for short ASCII strings

2019-06-24 Thread Inada Naoki
Inada Naoki added the comment: This optimization is only for short strings. There is no significant difference for long and non-ASCII strings. ``` # 1000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*1000' -- 'b.decode()' python-master: ...

[issue37348] Optimize PyUnicode_FromString for short ASCII strings

2019-06-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you please measure the performance for long strings (1000, 1 and 10 characters): a long ASCII string and a long ASCII string ending with a non-ASCII character? -- ___ Python tracker <ht

[issue37348] Optimize PyUnicode_FromString for short ASCII strings

2019-06-23 Thread Inada Naoki
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue37348] Optimize PyUnicode_FromString for short ASCII strings

2019-06-23 Thread Inada Naoki
Inada Naoki added the comment: New changeset 770847a7db33b3d4c451b42372b6942687aa6121 by Inada Naoki in branch 'master': bpo-37348: optimize decoding ASCII string (GH-14283) https://github.com/python/cpython/commit/770847a7db33b3d4c451b42372b6942687aa6121 --

[issue37348] Optimize PyUnicode_FromString for short ASCII strings

2019-06-21 Thread STINNER Victor
STINNER Victor added the comment: I'm confused by the issue title: PyUnicode_GetString() doesn't exist, it's PyUnicode_FromString() :-) I changed the title. -- title: Optimize PyUnicode_GetString for short ASCII strings -> Optimize PyUnicode_FromString for short ASCII stri

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-21 Thread Inada Naoki
Inada Naoki added the comment: PR 14291 seems simpler than PR 14283. But PR 14283 is faster because _PyUnicodeWriter is a learge struct. master: 3.7sec PR 14283: 2.9sec PR 14291: 3.45sec compiler: gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 without LTO, without PGO --

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-21 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +14114 pull_request: https://github.com/python/cpython/pull/14291 ___ Python tracker ___

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-21 Thread Inada Naoki
Inada Naoki added the comment: Another micro benchmark: ``` $ ./python-master -m pyperf timeit -o m1.json 'b=b"foobar"' -- 'b.decode()' . Mean +- std dev: 93.1 ns +- 2.4 ns $ ./python -m pyperf timeit -o m2.json 'b=b"foobar"' -- 'b.decode()' . Mean +-

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-21 Thread Inada Naoki
Inada Naoki added the comment: > I don't understand how _PyUnicodeWriter could be slow. It does not > overallocate by default. It's just wrapper to implement efficient memory > management. I misunderstood _PyUnicodeWriter. I thought it caused one more allocation, but it doesn't. But

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-21 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +14105 pull_request: https://github.com/python/cpython/pull/14283 ___ Python tracker ___

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-20 Thread STINNER Victor
STINNER Victor added the comment: > _PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because > PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8. I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-20 Thread Inada Naoki
Inada Naoki added the comment: Oh, wait. Why we used _PyUnicodeWriter here? Decoding UTF-8 must not require it. 2-pass is enough. -- ___ Python tracker ___

[issue37348] Optimize PyUnicode_GetString for short ASCII strings

2019-06-20 Thread Inada Naoki
tor. Of course, I used it just for micro benchmarking. Optimizing it is not a goal. In case of PR 14273: $ ./python -m pyperf timeit -s 'd={}' -- 'repr(d)' . Mean +- std dev: 138 ns +- 2 ns -- title: add _PyUnicode_FROM_ASCII macro -> Optimize PyUnicode_GetString for

[issue36033] logging.makeLogRecord should update "rv" using a dict defined with bytes instead of strings

2019-06-19 Thread Vinay Sajip
Vinay Sajip added the comment: Closing, as no further feedback from issue reporter. Feel free to reopen if you have a good response to my last comment. -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker

[issue32846] Deletion of large sets of strings is extra slow

2019-06-15 Thread Tim Peters
Tim Peters added the comment: Thanks, Terry! Based on your latest results, "quadratic time" isn't plausible here anymore, so I'm closing this. Nasty cache effects certainly played a role, but they were just a flea on the dog ;-) -- resolution: -> fixed stage: commit review ->

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: I reran the code in msg312188. Ints as before, string deletion +- linear up to 100 million, much better than before. millions old stringsnew strings of items create delete create delete 4 1.55.36 1.7.38 8 3.18.76

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
have enough RAM even to run Terry's test code to confirm it. I can confirm that there's no quadratic-time behavior anymore deleting large sets of strings, but only until I run out of RAM. Neither is the behavior linear, but it's much closer to linear than quadratic now. If someone with more RAM

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: Can we close this now? ISTM the issue has less to do with sets and more to do with memory allocation quirks and that on modern CPUs random memory accesses are slower than sequential memory accesses. It is not a bug that sets are unordered collections

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Change by Tim Peters : -- stage: resolved -> commit review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Tim Peters added the comment: Looks likely that the _major_ cause of the quadratic-time delete behavior was due to that obmalloc used a linear-time method to keep its linked list of usable arenas sorted in order of number of free pools. When a pool became unused, its arena's count of free

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Inada Naoki
Change by Inada Naoki : -- resolution: wont fix -> stage: resolved -> status: closed -> open versions: +Python 3.9 -Python 3.7, Python 3.8 ___ Python tracker ___

[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows

2019-06-14 Thread STINNER Victor
STINNER Victor added the comment: Thanks Zackery Spytz for the fix. Thanks Gergely Erdélyi for the bug report! Sorry for the long delay. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows

2019-06-14 Thread miss-islington
miss-islington added the comment: New changeset b0f6fa8d7d4c6d8263094124df9ef9cf816bbed6 by Miss Islington (bot) in branch '3.8': bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081) https://github.com/python/cpython/commit

[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows

2019-06-14 Thread miss-islington
miss-islington added the comment: New changeset 0b592d513b073cd3a4ba7632907c25b8282f15ce by Miss Islington (bot) in branch '3.7': bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081) https://github.com/python/cpython/commit

[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows

2019-06-14 Thread miss-islington
Change by miss-islington : -- pull_requests: +13944 pull_request: https://github.com/python/cpython/pull/14088 ___ Python tracker ___

<    5   6   7   8   9   10   11   12   13   14   >