[issue485951] repr diff between string and unicode.

2022-04-10 Thread admin
Change by admin : -- github: None -> 35601 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue31398] TypeError: gdbm key must be string, not unicode

2021-06-18 Thread Irit Katriel
Change by Irit Katriel : -- stage: -> resolved status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue31398] TypeError: gdbm key must be string, not unicode

2021-06-18 Thread STINNER Victor
STINNER Victor added the comment: IMO you can close it immediately. Fixing Python 2.7 is not going to happen ever. -- status: pending -> open ___ Python tracker ___

[issue31398] TypeError: gdbm key must be string, not unicode

2021-06-18 Thread Irit Katriel
Irit Katriel added the comment: This seems like a python 2-only issue, if nobody objects I will close it in a couple of weeks. -- nosy: +iritkatriel resolution: -> out of date status: open -> pending ___ Python tracker

[issue13943] distutils’ build_py fails when package string is unicode

2021-02-03 Thread Steve Dower
Steve Dower added the comment: Distutils is now deprecated (see PEP 632) and all tagged issues are being closed. From now until removal, only release blocking issues will be considered for distutils. If this issue does not relate to distutils, please remove the component and reopen it. If

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I though there are at least 3-4 use cases in the core and stdlib. -- ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-14 Thread Inada Naoki
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-14 Thread Inada Naoki
Inada Naoki added the comment: New changeset 3a8c56295d6272ad2177d2de8af4c3f824f3ef92 by Inada Naoki in branch 'master': Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer()" (GH-18985) https://github.com/python/cpython/commit/3a8c56295d6272ad2177d2de8af4c3f824f3ef92 --

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-13 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +18333 pull_request: https://github.com/python/cpython/pull/18985 ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-13 Thread Inada Naoki
Inada Naoki added the comment: I'm sorry about merging PR 18327, but I can not find enough usage example of the _PyUnicode_GetUTF8Buffer. PyUnicode_AsUTF8AndSize is optimized, and utf8_cache is not so bad in most case. So _PyUnicode_GetUTF8Buffer seems not worth enough. I will revert PR

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-13 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +18332 pull_request: https://github.com/python/cpython/pull/18984 ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-03-13 Thread Inada Naoki
Inada Naoki added the comment: New changeset c7ad974d341d3edb6b9d2a2dcae4d3d4794ada6b by Inada Naoki in branch 'master': bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659) https://github.com/python/cpython/commit/c7ad974d341d3edb6b9d2a2dcae4d3d4794ada6b --

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-02-26 Thread Inada Naoki
Inada Naoki added the comment: New changeset 02a4d57263a9846de35b0db12763ff9e7326f62c by Inada Naoki in branch 'master': bpo-39087: Optimize PyUnicode_AsUTF8AndSize() (GH-18327) https://github.com/python/cpython/commit/02a4d57263a9846de35b0db12763ff9e7326f62c --

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-02-03 Thread Inada Naoki
Inada Naoki added the comment: Attached patch is the benchmark function I used in previous post. -- Added file: https://bugs.python.org/file48879/bench-asutf8.patch ___ Python tracker

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-02-03 Thread Inada Naoki
2.1 us ## vs AsUTF8String (ASCII) If we can not accept cache, PyUnicode_AsUTF8String is slower than PyUnicode_AsUTF8 when the unicode is ASCII string. PyUnicode_GetUTF8Buffer helps only this case. $ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello&

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2020-02-03 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +17701 pull_request: https://github.com/python/cpython/pull/18327 ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-24 Thread Inada Naoki
Inada Naoki added the comment: > I like this idea, but I think that we should at least notify Python-Dev about > all additions to the public C API. If somebody have objections or better > idea, it is better to know earlier. I created a post about this issue in discuss.python.org.

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-23 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +17140 pull_request: https://github.com/python/cpython/pull/17683 ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I like this idea, but I think that we should at least notify Python-Dev about all additions to the public C API. If somebody have objections or better idea, it is better to know earlier. -- ___ Python tracker

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Inada Naoki
Change by Inada Naoki : -- keywords: +patch pull_requests: +17127 stage: -> patch review pull_request: https://github.com/python/cpython/pull/17659 ___ Python tracker ___

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Inada Naoki
Inada Naoki added the comment: > Don't you need to DECREF bytes somehow, at least, in case of failure? Thanks. I will create a pull request with suggested changes. -- ___ Python tracker

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread STINNER Victor
STINNER Victor added the comment: return PyBytesType.tp_as_buffer(bytes, view, PyBUF_CONTIG_RO); Don't you need to DECREF bytes somehow, at least, in case of failure? -- ___ Python tracker

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Inada Naoki
Inada Naoki added the comment: s/return NULL/return -1/g -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Inada Naoki
Inada Naoki added the comment: > Would it be possible to use a "container" object like a Py_buffer? Is there a > way to customize the code executed when a Py_buffer is "released"? It looks nice idea! Py_buffer.obj is decref-ed when releasing the buffer.

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Would it be possible to use a "container" object like a Py_buffer? Looks like a good idea. int PyUnicode_GetUTF8Buffer(Py_buffer *view, const char *errors) -- nosy: +skrah ___ Python tracker

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread STINNER Victor
STINNER Victor added the comment: > The returned object is the owner of the *utf8*. You need to Py_DECREF() it > after > you finished to using the *utf8*. The owner may be not the unicode. Would it be possible to use a "container" object like a Py_buffer? Is there a way to customize the

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Do you mean some concrete code? Several times I wished similar feature. To get a UTF-8 cache if it exists and encode to UTF-8 without creating a cache otherwise. The private _PyUnicode_UTF8() macro could help if ((s = _PyUnicode_UTF8(str))) { size =

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

2019-12-18 Thread STINNER Victor
Change by STINNER Victor : -- title: No efficient API to get UTF-8 string from unicode object. -> [C API] No efficient C API to get UTF-8 string from unicode object. ___ Python tracker <https://bugs.python.org/issu

[issue39087] No efficient API to get UTF-8 string from unicode object.

2019-12-18 Thread STINNER Victor
Change by STINNER Victor : -- nosy: +vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue39087] No efficient API to get UTF-8 string from unicode object.

2019-12-18 Thread Inada Naoki
malloc + memcpy to create the cache. * PyUnicode_DecodeUTF8(): It creates bytes object even when the unicode object is ASCII-only or there is a UTF-8 cache already. For speed and efficiency, I propose a new API: ``` /* Borrow the UTF-8 C string from the unicode. * * Store a pointer

Re: TypeError: expected string or Unicode object, NoneType found

2018-05-19 Thread Terry Reedy
string.from_py.__pyx_convert_string_from_py_std__in_string (pycrfsuite/_pycrfsuite.cpp:10633) TypeError: expected string or Unicode object, NoneType found I have searched for solutions in web found the following links as, https://stackoverflow.com/questions/14219038/python-multiprocessing-typeer

Re: TypeError: expected string or Unicode object, NoneType found

2018-05-19 Thread Peter Otten
ite.pyx", line 312, in > pycrfsuite._pycrfsuite.BaseTrainer.append > (pycrfsuite/_pycrfsuite.cpp:3800) File "stringsource", line 53, in > vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string > (pycrfsuite/_pycrfsuite.cpp:10738) File "stringsource",

TypeError: expected string or Unicode object, NoneType found

2018-05-19 Thread subhabangalore
suite.cpp:3800) File "stringsource", line 53, in vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string (pycrfsuite/_pycrfsuite.cpp:10738) File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string (pycrfsuite/_pycrfsuite.cpp:10633) TypeError: expect

[issue31398] TypeError: gdbm key must be string, not unicode

2017-09-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: has_key(u"x") works if x is a valid key. has_key() parses the argument with PyArg_ParseTuple("s#") which implicitly converts unicode to str. __contains__() explicitly checks for str type. -- nosy: +serhiy.storchaka

[issue31398] TypeError: gdbm key must be string, not unicode

2017-09-09 Thread R. David Murray
R. David Murray added the comment: In python3, u"a" and "a" are the same thing. The equivalent in python3 would bee b"a" vs "a", but I have no idea if we even support bytes keys in python3 gdbm. In 2.7 does has_key(u"x") work if x is a valid key? -- nosy: +r.david.murray

[issue31398] TypeError: gdbm key must be string, not unicode

2017-09-08 Thread sds
sds added the comment: the problem is not present in Python 3.6.2 (default, Jul 17 2017, 16:44:45): ``` >>> import dbm >>> import dbm.gnu >>> db = dbm.gnu.open("foo","c") >>> "a" in db False >>> u"a" in db False ``` -- ___ Python tracker

[issue31398] TypeError: gdbm key must be string, not unicode

2017-09-08 Thread sds
; "a" in db False >>> u"a" in db Traceback (most recent call last): File "", line 1, in TypeError: gdbm key must be string, not unicode ``` -- components: Unicode messages: 301728 nosy: ezio.melotti, haypo, sam-s priority: nor

[issue31398] TypeError: gdbm key must be string, not unicode

2017-09-08 Thread sds
sds added the comment: platform: Python 2.7.13 (default, Jul 18 2017, 09:17:00) [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin -- ___ Python tracker

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2017-02-08 Thread Zsolt Endreffy
Zsolt Endreffy added the comment: I think this patch breaks compatibility between python 2.7 versions. Our rpc server has 2.7.10 Python version, and sends back tuples as responses (first value is a boolean, second is a string). If we connect with a computer, which has 2.7.11 or earlier

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-05-04 Thread Roundup Robot
Roundup Robot added the comment: New changeset 0d015f6aba8b by Serhiy Storchaka in branch '3.5': Issue #26873: xmlrpc now raises ResponseError on unsupported type tags https://hg.python.org/cpython/rev/0d015f6aba8b New changeset 8f7cb3b171f3 by Serhiy Storchaka in branch 'default': Issue

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-05-04 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-05-01 Thread Nathan Williams
Nathan Williams added the comment: Serhiy, that workaround worked for my needs, thanks. -- ___ Python tracker ___

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Opened issue26885 for adding support of "ex:nil" and other types. -- ___ Python tracker ___

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Proposed patch makes xmlrpc unmarshaller to be more strong and raise ValueError instead of returning incorrect value when encounters with unsupported value type. The unmarshaller still skips unknown tags silently if they are occurred outside of the "value"

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Minimal reproducer: >>> xmlrpclib.loads('ab') (({'a': 'b'},), None) The workaround for your particular case, Nathan: xmlrpclib.Unmarshaller.dispatch['ex:nil'] = xmlrpclib.Unmarshaller.dispatch['nil'] -- ___

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you. Your server produces a response containing nonstandard type tag "ex:nil" [1]. authorityROLE_USER fromDate userId15 xmlrpclib is unable to handle this tag, and error handling is poor. xmlrpclib can handle only standard types and "nil"

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-27 Thread Nathan Williams
Nathan Williams added the comment: I have attached the response. As it is coming from our UMS, I had to redact a few values, but that shouldn't matter. For reference, they were the host name of my email address, and the hashes of passwords etc. Our UMS is a bit too chatty! -- Added

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you provide a response of your server (pass verbose=True to ServerProxy)? -- nosy: +serhiy.storchaka ___ Python tracker

[issue26873] xmlrpclib raises when trying to convert an int to string when unicode is available

2016-04-27 Thread Nathan Williams
New submission from Nathan Williams: I am using xmlrpclib against an internal xmlrpc server. One of the responses returns integer values, and it raises an exception in "_stringify" The code for _stringify is (xmlrpclib.py:180 in python2.7): if unicode: def _string

[issue22575] bytearray documentation confuses string for unicode objects

2014-10-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset 0c75819f1d86 by Terry Jan Reedy in branch '2.7': Issue #22575: Revise bytearray entry for 2.7. https://hg.python.org/cpython/rev/0c75819f1d86 -- nosy: +python-dev ___ Python tracker

[issue22575] bytearray documentation confuses string for unicode objects

2014-10-10 Thread Terry J. Reedy
Terry J. Reedy added the comment: I changed the line to * If it is unicode, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the unicode to bytes using unicode.encode(). Thank you for the report, and for your work answering questions on SO

[issue22575] bytearray documentation confuses string for unicode objects

2014-10-07 Thread Martijn Pieters
string for unicode objects versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22575 ___ ___ Python-bugs-list mailing list Unsubscribe: https

[issue13943] distutils’ build_py fails when package string is unicode

2014-01-10 Thread Boris FELD
Boris FELD added the comment: An issue has been opened in pip repository: https://github.com/pypa/pip/issues/1441 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13943 ___

[issue13943] distutils’ build_py fails when package string is unicode

2014-01-03 Thread Boris FELD
Boris FELD added the comment: I've the same problem today with package https://pypi.python.org/pypi/httpretty/0.7.1 but only when I try to install one of my project which requires httpretty, if I try to install it directly it works like a charm. pip install httpretty - works pip install

[issue13943] distutils’ build_py fails when package string is unicode

2014-01-03 Thread Éric Araujo
Éric Araujo added the comment: It’s strange that this would happen when installing as a dependency and not when installing directly. Pip can change faster than stdlib is released, could you report the bug to them and see if it’s possible to pass __file__ as a byte string? --

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Mark Lawrence
On 03/09/2012 06:39, wxjmfa...@gmail.com wrote: Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li,

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Peter Otten
wxjmfa...@gmail.com wrote: Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! I can't confirm this. At least users of wide builds will see

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Terry Reedy
On 9/3/2012 2:15 AM, Peter Otten wrote: At least users of wide builds will see a decrease in memory use: Everyone saves because everyone uses large parts of the stdlib. When 3.3 start up in a Windows console, there are 56 modules in sys.modules. With Idle, there are over 130. All the

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Roy Smith
In article 50440de2$0$29967$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Indexing is O(0) for any string. I think you mean O(1) for constant-time lookups. Why settle for constant-time, when you can have zero-time instead :-) --

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Serhiy Storchaka
On 03.09.12 04:42, Steven D'Aprano wrote: If you are *seriously* interested in debugging why string code is slower for you, you can start by running the full suite of Python string benchmarks: see the stringbench benchmark in the Tools directory of source installations, or see here:

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Serhiy Storchaka
On 03.09.12 04:54, Steven D'Aprano wrote: This means that Python 3.3 will no longer have surrogate pairs. Am I right? As Terry said, basically, yes. Python 3.3 does not need in surrogate pairs, but does not prevent their creation. You can create a surrogate code (U+D800..U+DFFF)

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Serhiy Storchaka
On 03.09.12 09:15, Peter Otten wrote: wxjmfa...@gmail.com wrote: Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! I can't confirm

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Ian Kelly
On Sun, Sep 2, 2012 at 6:00 AM, Serhiy Storchaka storch...@gmail.com wrote: On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use

Re: Flexible string representation, unicode, typography, ...

2012-09-03 Thread Steven D'Aprano
On Mon, 03 Sep 2012 18:26:02 +0300, Serhiy Storchaka wrote: On 03.09.12 04:42, Steven D'Aprano wrote: If you are *seriously* interested in debugging why string code is slower for you, you can start by running the full suite of Python string benchmarks: see the stringbench benchmark in the

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit : I honestly suggest you shut up until you have a clue. Désolé Antoine, I have not the knowledge to dive in the Python code, but I know what is a character. The coding of the characters is a domain per se, independent from the

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Mark Lawrence
On 02/09/2012 08:36, wxjmfa...@gmail.com wrote: Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit : I honestly suggest you shut up until you have a clue. Désolé Antoine, I have not the knowledge to dive in the Python code, but I know what is a character. You're a character,

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Ian Kelly
On Sun, Sep 2, 2012 at 1:36 AM, wxjmfa...@gmail.com wrote: I still remember my thoughts when I read the PEP 393 discussion: this is not logical, they do no understand typography, atomic character ???, ... That would indicate one of two possibilities. Either: 1) Everybody in the PEP 393

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Peter Otten
Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use directly: sorted(li, key=locale.strxfrm) -- http://mail.python.org/mailman/listinfo/python-list

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Mark Lawrence
I've found the white paper which gives the technical basis for the claims made by jmf so thought I'd better share in order to explain his rationale. http://www.montypython.net/scripts/right-think.php -- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Serhiy Storchaka
On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use directly: sorted(li, key=locale.strxfrm) Hmm, and with locale.strxfrm Python

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Mark Lawrence
On 02/09/2012 13:00, Serhiy Storchaka wrote: On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use directly: sorted(li,

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Roy Smith
In article mailman.84.1346588596.27098.python-l...@python.org, Mark Lawrence breamore...@yahoo.co.uk wrote: On 02/09/2012 13:00, Serhiy Storchaka wrote: On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li,

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Ramchandra Apte
On Sunday, 2 September 2012 17:53:16 UTC+5:30, Mark Lawrence wrote: On 02/09/2012 13:00, Serhiy Storchaka wrote: On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li,

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Mark Lawrence
On 02/09/2012 14:48, Ramchandra Apte wrote: please make it *heavily optimized* machine code Goes without saying. First thing I'll concentrate on is removing superfluous newlines sent by crappy mail clients or similar. -- Cheers. Mark Lawrence. --

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit : On Sun, Sep 2, 2012 at 1:36 AM, wxjmfa...@gmail.com wrote: I still remember my thoughts when I read the PEP 393 discussion: this is not logical, they do no understand typography, atomic character ???, ... That would

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Michael Torrie
On 09/02/2012 12:58 PM, wxjmfa...@gmail.com wrote: My rationale: very simple. 1) I never heard about something better than sticking with one of the Unicode coding scheme. (genreral theory) 2) I am not at all convinced by the new Py 3.3 algorithm. I'm not the only one guy, who noticed

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Dave Angel
On 09/02/2012 03:45 PM, Michael Torrie wrote: jmfauth snipped: In the worst case, Python's strings are as slow as Go because Python does the exact same thing as Go, but chooses between three encodings instead of just one. Best case scenario, Python's strings could be much faster than Go's

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Terry Reedy
On 9/2/2012 3:45 PM, Michael Torrie wrote: In the worst case, Python's strings are as slow as Go because Python does the exact same thing as Go, but chooses between three encodings instead of just one. Best case scenario, Python's strings could be much faster than Go's because indexing through

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Serhiy Storchaka
On 30.08.12 09:55, Steven D'Aprano wrote: And Python's solution uses those: UCS-2, UCS-4, and UTF-8. I see that this misconception widely spread. In fact Python 3.3 uses four kinds of ready strings. * ASCII. All codes = U+007F. * UCS1. All codes = U+00FF, at least one code U+007F. * UCS2.

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Serhiy Storchaka
On 02.09.12 23:38, Serhiy Storchaka wrote: Indexing is O(0) for any string. Typo. O(1) -- http://mail.python.org/mailman/listinfo/python-list

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Steven D'Aprano
On Sun, 02 Sep 2012 11:58:08 -0700, wxjmfauth wrote: - Unfortunately, I got opposite and even much worst results on my win box, considering - libfrancais is one of my module and it does a little bit more than the std sorting tools. How do we know that the problem isn't in your module? My

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Steven D'Aprano
On Sun, 02 Sep 2012 23:38:49 +0300, Serhiy Storchaka wrote: On 30.08.12 09:55, Steven D'Aprano wrote: And Python's solution uses those: UCS-2, UCS-4, and UTF-8. I see that this misconception widely spread. I am not familiar enough with the C implementation to tell what Python 3.3 actually

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread Terry Reedy
On 9/2/2012 9:54 PM, Steven D'Aprano wrote: On Sun, 02 Sep 2012 23:38:49 +0300, Serhiy Storchaka wrote: On 30.08.12 09:55, Steven D'Aprano wrote: And Python's solution uses those: UCS-2, UCS-4, and UTF-8. I see that this misconception widely spread. I am not familiar enough with the C

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : On 02.09.12 12:52, Peter Otten wrote: Ian Kelly wrote: Rewriting the example to use locale.strcoll instead: sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which

Re: Flexible string representation, unicode, typography, ...

2012-08-31 Thread Steven D'Aprano
On Thu, 30 Aug 2012 16:44:32 -0400, Terry Reedy wrote: On 8/30/2012 12:00 PM, Steven D'Aprano wrote: On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: [...] Is the implementation smart enough to know that x == y is always False if x and y are using different internal representations?

Re: Flexible string representation, unicode, typography, ...

2012-08-31 Thread Roy Smith
In article 503f8e33$0$30001$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: Is the implementation smart enough to know that x == y is always False if x and y are using different internal

Re: Flexible string representation, unicode, typography, ...

2012-08-31 Thread Steven D'Aprano
On Fri, 31 Aug 2012 08:43:55 -0400, Roy Smith wrote: In article 503f8e33$0$30001$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: Is the implementation smart enough to know that x == y is

Re: Flexible string representation, unicode, typography, ...

2012-08-31 Thread Ian Kelly
On Fri, Aug 31, 2012 at 6:32 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: That's one thing that I'm unclear about -- under what circumstances will a string be in compact versus non-compact form? I understand it to be entirely dependent on which API is used to construct. The

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Steven D'Aprano
On Wed, 29 Aug 2012 08:43:05 -0700, wxjmfauth wrote: I can hit the nail a little more. I have even a better idea and I'm serious. If Python has found a new way to cover the set of the Unicode characters, why not proposing it to the Unicode consortium? Because the implementation of the str

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread wxjmfauth
Le jeudi 30 août 2012 08:55:01 UTC+2, Steven D'Aprano a écrit : You are right. But as soon as you introduce artificially a latin-1 bottleneck, all this machinery just become useless. This flexible representation is working absurdly. It optimizes the characters you are not using (in one sense),

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Chris Angelico
On Thu, Aug 30, 2012 at 6:51 PM, wxjmfa...@gmail.com wrote: Pick up a random text and see the probability this text match the most optimized case 1 char / 1 byte, practically never. Only if you talk about a huge document. Try, instead, every string ever used in a Python script. Practically

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Roy Smith
In article 503f0e45$0$9416$c3e8da3$76491...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The only thing which is innovative here is that instead of the Python compiler declaring that all strings will be stored in UCS-2, the compiler chooses an

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Antoine Pitrou
wxjmfauth at gmail.com writes: Pick up a random text and see the probability this text match the most optimized case 1 char / 1 byte, practically never. Funny that you posted a text which does just that: http://mail.python.org/pipermail/python-list/2012-August/629554.html In a funny way,

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Steven D'Aprano
On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: In article 503f0e45$0$9416$c3e8da3$76491...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The only thing which is innovative here is that instead of the Python compiler declaring that all strings will be

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Ian Kelly
On Thu, Aug 30, 2012 at 2:51 AM, wxjmfa...@gmail.com wrote: But as soon as you introduce artificially a latin-1 bottleneck, all this machinery just become useless. How is this a bottleneck? If you removed the Latin-1 encoding altogether and limited the flexible representation to just UCS-2 /

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread Terry Reedy
On 8/30/2012 12:00 PM, Steven D'Aprano wrote: On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: In article 503f0e45$0$9416$c3e8da3$76491...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The only thing which is innovative here is that instead of the

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread Steven D'Aprano
On Tue, 28 Aug 2012 22:15:31 -0600, Ian Kelly wrote: On Tue, Aug 28, 2012 at 8:42 PM, rusi rustompm...@gmail.com wrote: How difficult would it be to giving the choice of string engine as a command-line flag? This would avoid the nuisance of having two binaries -- narrow and wide. Quite

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread wxjmfauth
Le lundi 27 août 2012 22:37:03 UTC+2, (inconnu) a écrit : Le lundi 27 août 2012 22:14:07 UTC+2, Ian a écrit : On Mon, Aug 27, 2012 at 1:16 PM, wxjmfa...@gmail.com wrote: - Why int32 and not uint32? No idea, I tried to find an answer without asking.

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread wxjmfauth
Le mercredi 29 août 2012 06:16:05 UTC+2, Ian a écrit : On Tue, Aug 28, 2012 at 8:42 PM, rusi rustompm...@gmail.com wrote: In summary: 1. The problem is not on jmf's computer 2. It is not windows-only 3. It is not directly related to latin-1 encodable or not The only

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread Dave Angel
On 08/29/2012 07:40 AM, wxjmfa...@gmail.com wrote: snip Forget Python and all these benchmarks. The problem is on an other level. Coding schemes, typography, usage of characters, ... For a given coding scheme, all code points/characters are equivalent. Expecting to handle a sub-range in a

  1   2   >