[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-10 Thread Jim Jewett
Jim Jewett added the comment: I think the new wording is an improvement, but keeping the changes minimal left it in an awkward in-between state. Proposal: A string is a sequence of Unicode code points. Strings can include any sequence of code points, including some which are semantically

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-10 Thread Jim Jewett
Jim Jewett added the comment: And even my rewrite showed path dependency; a slight further improvement is to re-order encoding ahead of bytes. I also added a paragraph that I hope answers the speed issue. Proposal: A string is a sequence of Unicode code points. Strings can include any

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-07 Thread Roundup Robot
Roundup Robot added the comment: New changeset 6ffb6909c439 by Nick Coghlan in branch '3.4': Issue #21667: Clarify string data model description http://hg.python.org/cpython/rev/6ffb6909c439 New changeset 7c120e77d6f7 by Nick Coghlan in branch 'default': Merge issue #21667 from 3.4

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-07 Thread Nick Coghlan
Nick Coghlan added the comment: I've merged the character-code point clarifications, without the implementation detail section. For the time being, that leaves doesn't provide O(1) indexing of strings as the kind of discrepancy that often makes an appearance in differences from the CPython

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
New submission from Nick Coghlan: Based on the recent python-dev thread, I propose the following CPython implementation detail note in the Strings entry of https://docs.python.org/3/reference/datamodel.html#objects-values-and-types CPython currently guarantees O(1) access to arbitrary code

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread STINNER Victor
STINNER Victor added the comment: str[a:b] returns a substring (characters), not an array of code points (numbers). -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21667 ___

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: Guido, I think we need your call on whether or not to add a note about string indexing algorithmic complexity to the language reference, and to approve the exact wording of such a note (my proposed wording is in my initial comment on this issue). --

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: No, Python doesn't expose Unicode characters in its data model at all, except in those cases where a code point happens to correspond directly with a character. A length 1 str instance represents a Unicode code point, not a Unicode character. --

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: Although, you're right, that section of the data model docs misuses the word character to mean something other than what it means in the Unicode spec :( -- ___ Python tracker rep...@bugs.python.org

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread STINNER Victor
STINNER Victor added the comment: Python implementations are required to ... By the way, Python 3.3 doesn't implement this requirement :-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21667

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: Saying that ord() and chr() switch between characters and code points is just plain wrong, since characters may be represented as multiple code points. We may also want to explicitly note that the Unicode normalisation is implementation dependendent, and that

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: Right, narrow builds have long been broken - that's a large part of why this is now the requirement :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21667

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: Patch attached that also addresses the characters vs code points confusion. -- Added file: http://bugs.python.org/file35489/issue21667_clarify_str_specification.rst ___ Python tracker rep...@bugs.python.org

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: I ducked the Unicode normalisation question for now, since that's a *different* can of worms :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21667 ___

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Two things: - I don't think it's very helpful to use the term code point without explaining or introducing it (character at least can be understood intuitively) - The mention of slicing is ambiguous: is slicing suppoded to be O(1)? how is indexing related to

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Nick Coghlan
Nick Coghlan added the comment: If someone doesn't understand what Unicode code point means, that's going to be the least of their problems when it comes to implementing a conformant Python implementation. We could link to http://unicode.org/glossary/#code_point, but that doesn't really add

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Not sure what implementing a conformant Python implementation has to do with this; the language specification should be readable by any interested programmers, IMO. If you try to dive into the formal Unicode spec instead, you end up in a twisty maze of

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Then perhaps we need notes about algorithmic complexity of bytes, bytearray, list and tuple and dict indexing, set.add and set.discard, dict.__delitem__, list.pop, len(), + and += for all basic sequences and containers, memoryview() for bytes, bytearray and

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread STINNER Victor
STINNER Victor added the comment: Then perhaps we need notes about algorithmic complexity of bytes, bytearray, list and tuple and dict indexing, set.add and set.discard, dict.__delitem__, list.pop, len(), + and += for all basic sequences and containers, memoryview() for bytes, bytearray

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Chris Angelico
Changes by Chris Angelico ros...@gmail.com: -- nosy: +Rosuav ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21667 ___ ___ Python-bugs-list mailing

[issue21667] Clarify status of O(1) indexing semantics of str objects

2014-06-05 Thread Guido van Rossum
Guido van Rossum added the comment: I don't want the O(1) property explicitly denounced in the reference manual. It's fine if the manual is silent on this -- maybe someone can prove that it isn't a problem based on benchmarks of an alternate implementation, but until then, I'm skeptical --