On Fri, Jun 06, 2014 at 12:51:11PM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
(1) I asked if it would be okay for MicroPython to *optionally* use
nominally Unicode strings limited to ASCII. Pretty much the only
response to this as been Guido saying That would be a pretty lousy
option,
On 04/06/2014 16:52, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise),
re.finditer('\\S+', string) also provides the same behaviour and gives
me the sliced string, so there's no need to index for anything.
Out of idle
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise), re.finditer('\\S+',
string) also provides the same behaviour and gives me the sliced string, so
there's no need to index for anything.
Out of
Hello,
On Thu, 5 Jun 2014 22:21:30 +1000
Tim Delaney timothy.c.dela...@gmail.com wrote:
On 5 June 2014 22:01, Paul Sokolovsky pmis...@gmail.com wrote:
All these changes are what let me dream on and speculate on
possibility that Python4 could offer an encoding-neutral string type
Hello,
On Thu, 5 Jun 2014 22:38:13 +1000
Nick Coghlan ncogh...@gmail.com wrote:
On 5 June 2014 22:10, Stefan Krah ste...@bytereef.org wrote:
Paul Sokolovsky pmis...@gmail.com wrote:
In this regard, I'm glad to participate in mind-resetting
discussion. So, let's reiterate - there's nothing
Steven D'Aprano wrote:
I don't know about car engine controllers, but presumably they have
diagnostic ports, and they may sometimes output text. If they output
text, then at least hypothetically car mechanics in Russia might prefer
their car to output правда and ложный rather than true and
Paul Sokolovsky writes:
That kinda means string is atomic, instead of your characters are
atomic.
I would be very surprised if a language that behaved that way was
called a Python subset. No indexing, no slicing, no regexps, no
.split(), no .startswith(), no sorted() or .sort(), ...!?
If
Hello,
On Thu, 5 Jun 2014 23:15:54 +1000
Nick Coghlan ncogh...@gmail.com wrote:
On 5 June 2014 22:37, Paul Sokolovsky pmis...@gmail.com wrote:
On Thu, 5 Jun 2014 22:20:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
problems caused by trusting the locale encoding to be correct, but
the
Hello,
On Fri, 06 Jun 2014 20:11:27 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
Paul Sokolovsky writes:
That kinda means string is atomic, instead of your characters
are atomic.
I would be very surprised if a language that behaved that way was
called a Python subset. No
On 6 June 2014 21:15, Paul Sokolovsky pmis...@gmail.com wrote:
Hello,
On Thu, 5 Jun 2014 23:15:54 +1000
Nick Coghlan ncogh...@gmail.com wrote:
On 5 June 2014 22:37, Paul Sokolovsky pmis...@gmail.com wrote:
On Thu, 5 Jun 2014 22:20:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
On 6 June 2014 21:34, Paul Sokolovsky pmis...@gmail.com wrote:
On Fri, 06 Jun 2014 20:11:27 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
Paul Sokolovsky writes:
That kinda means string is atomic, instead of your characters
are atomic.
I would be very surprised if a
Hello,
On Fri, 06 Jun 2014 09:32:25 +0100
Mark Lawrence breamore...@yahoo.co.uk wrote:
On 04/06/2014 16:52, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise),
re.finditer('\\S+', string) also provides the same
On 06/06/2014 09:53, Hrvoje Niksic wrote:
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise),
re.finditer('\\S+', string) also provides the same behaviour and
gives me the sliced string, so there's no
Hello,
On Fri, 6 Jun 2014 21:48:41 +1000
Tim Delaney timothy.c.dela...@gmail.com wrote:
On 6 June 2014 21:34, Paul Sokolovsky pmis...@gmail.com wrote:
On Fri, 06 Jun 2014 20:11:27 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
Paul Sokolovsky writes:
That kinda means
On Fri, Jun 6, 2014 at 8:15 PM, Paul Sokolovsky pmis...@gmail.com wrote:
I'm sorry if I was somehow related to that, my
bringing in the formal language spec was more a rhetorical figure, a
response to people claiming O(1) requirement.
This was exactly why this whole discussion came up, though.
On 6/6/2014 4:53 AM, Hrvoje Niksic wrote:
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
Out of idle curiosity is there anything that stops MicroPython, or any
other implementation for that matter, from providing views of a string
rather than copying every time? IIRC memoryviews in CPython
On 06/06/2014 05:59 PM, Terry Reedy wrote:
The other problem is that a small slice view of a large object keeps the
large object alive, so a view user needs to think carefully about
whether to make a copy or create a view, and later to copy views to
delete the base object. This is not for
Hello,
On Fri, 06 Jun 2014 11:59:31 -0400
Terry Reedy tjre...@udel.edu wrote:
[]
The other problem is that a small slice view of a large object keeps
the large object alive, so a view user needs to think carefully about
whether to make a copy or create a view, and later to copy views to
On 7 June 2014 00:52, Paul Sokolovsky pmis...@gmail.com wrote:
At heart, this is exactly what the Python 3 str type is. The
universal convention is code points.
Yes. Except for one small detail - Python3 specifies these code points
to be Unicode code points. And Unicode is a very bloated
On 7 Jun 2014 00:53, Paul Sokolovsky pmis...@gmail.com wrote:
Yes. Except for one small detail - Python3 specifies these code points
to be Unicode code points. And Unicode is a very bloated thing.
I rather suspect users of East Asian African scripts might have a
different notion of what
Glenn Linderman writes:
3) (Most space efficient) One cached entry, that caches the last
codepoint/byte position referenced. UTF-8 is able to be traversed in
either direction, so next/previous codepoint access would be
relatively fast (and such are very common operations, even when
05.06.14 03:03, Greg Ewing написав(ла):
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't
use iterators. They use indices, str.find and/or regular expressions.
Common use case is quickly find substring starting from current
position using str.find or
04.06.14 23:50, Glenn Linderman написав(ла):
3) (Most space efficient) One cached entry, that caches the last
codepoint/byte position referenced. UTF-8 is able to be traversed in
either direction, so next/previous codepoint access would be
relatively fast (and such are very common operations,
Paul Sokolovsky writes:
Please put that in perspective when alarming over O(1) indexing of
inherently problematic niche datatype. (Again, it's not my or
MicroPython's fault that it was forced as standard string type. Maybe
if CPython seriously considered now-standard UTF-8 encoding,
05.06.14 05:25, Terry Reedy написав(ла):
I mentioned it as an alternative during the '393 discussion. I more than
half agree that the FSR is the better choice for CPython, which had no
particular attachment to UTF-16 in the way that I think Jython, for
instance, does.
Yes, I remember. I thing
Serhiy Storchaka writes:
Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is
used instead of UCS4) is the better choice for CPython. I suppose that
with populating emoticons and other icon characters in nearest 5 or 10
years, even English text will often contain
Hello,
On Wed, 04 Jun 2014 22:15:30 -0400
Terry Reedy tjre...@udel.edu wrote:
On 6/4/2014 6:52 PM, Paul Sokolovsky wrote:
Well is subjective (or should be defined formally based on the
requirements). With my MicroPython hat on, an implementation which
receives a string, transcodes it,
Hello,
On Thu, 05 Jun 2014 16:54:11 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
Paul Sokolovsky writes:
Please put that in perspective when alarming over O(1) indexing of
inherently problematic niche datatype. (Again, it's not my or
MicroPython's fault that it was forced as
On 5 June 2014 17:54, Stephen J. Turnbull step...@xemacs.org wrote:
What matters to you is that str (unicode) is an opaque type -- there
is no specification of the internal representation in the language
reference, and in fact several different ones coexist happily across
existing Python
On 5 June 2014 21:25, Paul Sokolovsky pmis...@gmail.com wrote:
Well, I understand the plan - hoping that people will get over this.
And I'm personally happy to stay away from this trolling, but any
discussion related to Unicode goes in circles and returns to feeling
that Unicode at the central
Hello,
On Thu, 5 Jun 2014 21:43:16 +1000
Nick Coghlan ncogh...@gmail.com wrote:
On 5 June 2014 21:25, Paul Sokolovsky pmis...@gmail.com wrote:
Well, I understand the plan - hoping that people will get over
this. And I'm personally happy to stay away from this trolling,
but any discussion
Paul Sokolovsky pmis...@gmail.com wrote:
In this regard, I'm glad to participate in mind-resetting discussion.
So, let's reiterate - there's nothing like the best, the only right,
the only correct, righter than, more correct than in CPython's
implementation of Unicode storage. It is
On 5 June 2014 22:01, Paul Sokolovsky pmis...@gmail.com wrote:
Aside from
some of the POSIX locale handling issues on Linux, many of the
concerns are with the usability of bytes and bytearray, not with str -
that's why binary interpolation is coming back in 3.5, and there will
likely be other
On 5 June 2014 22:01, Paul Sokolovsky pmis...@gmail.com wrote:
All these changes are what let me dream on and speculate on
possibility that Python4 could offer an encoding-neutral string type
(which means based on bytes)
To me, an encoding neutral string type means roughly characters are
Hello,
On Thu, 5 Jun 2014 22:20:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
[]
problems caused by trusting the locale encoding to be correct, but the
startup code will need non-trivial changes for that to happen - the
C.UTF-8 locale may even become widespread before we get there).
...
On 5 June 2014 22:10, Stefan Krah ste...@bytereef.org wrote:
Paul Sokolovsky pmis...@gmail.com wrote:
In this regard, I'm glad to participate in mind-resetting discussion.
So, let's reiterate - there's nothing like the best, the only right,
the only correct, righter than, more correct than in
On 5 June 2014 22:37, Paul Sokolovsky pmis...@gmail.com wrote:
On Thu, 5 Jun 2014 22:20:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
problems caused by trusting the locale encoding to be correct, but the
startup code will need non-trivial changes for that to happen - the
C.UTF-8 locale may
On Wed, Jun 04, 2014 at 11:17:18AM +1000, Steven D'Aprano wrote:
There is a discussion over at MicroPython about the internal
representation of Unicode strings. Micropython is aimed at embedded
devices, and so minimizing memory use is important, possibly even
more important than
On 5 June 2014 14:15, Nick Coghlan ncogh...@gmail.com wrote:
As I've said before in other contexts, find me Windows, Mac OS X and
JVM developers, or educators and scientists that are as concerned by
the text model changes as folks that are primarily focused on Linux
system (including network)
On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore p.f.mo...@gmail.com wrote:
On 5 June 2014 14:15, Nick Coghlan ncogh...@gmail.com wrote:
As I've said before in other contexts, find me Windows, Mac OS X and
JVM developers, or educators and scientists that are as concerned by
the text model changes
On 6/5/2014 3:10 AM, Paul Sokolovsky wrote:
Hello,
On Wed, 04 Jun 2014 22:15:30 -0400
Terry Reedy tjre...@udel.edu wrote:
think you are again batting at a strawman. If you mean 'read from a
file', and all you want to do is read bytes from and write bytes to
external 'files', then there is
On 6/5/2014 11:41 AM, Daniel Holth wrote:
discover new things
like dance-encoded strings, bytes decoded using an incorrect encoding
intended to be transcoded into the correct encoding later, surrogates
that work perfectly until .encode(), str(bytes), APIs that disagree
with you about whether the
Le 04/06/2014 02:51, Chris Angelico a écrit :
On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan ncogh...@gmail.com wrote:
It would. The downsides of a UTF-8 representation would be slower
iteration and much slower (O(N)) indexing/slicing.
There's no reason for iteration to be slower. Slicing would
On 6 Jun 2014 05:13, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On 6/5/2014 11:41 AM, Daniel Holth wrote:
discover new things
like dance-encoded strings, bytes decoded using an incorrect encoding
intended to be transcoded into the correct encoding later, surrogates
that work perfectly
Steven D'Aprano wrote:
(1) I asked if it would be okay for MicroPython to *optionally* use
nominally Unicode strings limited to ASCII. Pretty much the only
response to this as been Guido saying That would be a pretty lousy
option,
It would be limiting to have this as the *only* way of
Paul Sokolovsky wrote:
All these changes are what let me dream on and speculate on
possibility that Python4 could offer an encoding-neutral string type
(which means based on bytes)
Can you elaborate on exactly what you have in mind?
You seem to want something different from Python 3 str,
Steven D'Aprano wrote:
(1) I asked if it would be okay for MicroPython to *optionally* use
nominally Unicode strings limited to ASCII. Pretty much the only
response to this as been Guido saying That would be a pretty lousy
option, and since nobody has really defended the suggestion, I
On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan ncogh...@gmail.com wrote:
On 4 June 2014 11:17, Steven D'Aprano st...@pearwood.info wrote:
My own feeling is that O(1) string indexing operations are a quality of
implementation issue, not a deal breaker to call it a Python.
If string indexing
Zitat von Steven D'Aprano st...@pearwood.info:
* Having a build-time option to restrict all strings to ASCII-only.
(I think what they mean by that is that strings will be like Python 2
strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
An ASCII-plus-arbitrary-bytes type called str
On Wed, Jun 4, 2014 at 3:23 PM, Guido van Rossum gu...@python.org wrote:
On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico ros...@gmail.com wrote:
On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano st...@pearwood.info
wrote:
* Having a build-time option to restrict all strings to ASCII-only.
On Wed, Jun 4, 2014 at 5:02 PM, mar...@v.loewis.de wrote:
There are more things to consider for the internal implementation,
in particular how the string length is implemented. Several alternatives
exist:
1. store the UTF-8 length (i.e. memory size)
2. store the number of code points (i.e.
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:
There's a general expectation that indexing will be O(1) because all
the builtin containers that support that syntax use it for O(1) lookup
operations.
Depending on your definition of built in, there is at least one standard
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
fundamentally, I mean for those strings that have not yet noticed that
they contain no supplementary (0x) characters.
I've toyed with making this O(1)
dw+python-...@hmmz.org writes:
Given the specialized kinds of application this Python
implementation is targetted at, it seems UTF-8 is ideal considering
the huge memory savings resulting from the compressed
representation,
I think you really need to check what the applications are in
On Wed, Jun 4, 2014 at 11:36 AM, Stephen J. Turnbull step...@xemacs.org
wrote:
I think you really need to check what the applications are in detail.
UTF-8 costs about 35% more storage for Japanese, and even more for
Chinese, than does UTF-16.
UTF-8 can be smaller even for Asian languages,
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky pmis...@gmail.com wrote:
That's another reason why people don't like Unicode enforced upon them
- all the talk about supporting all languages and scripts is demagogy
and hypocrisy, given a choice, Unicode zealots would rather limit
people to
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky pmis...@gmail.com wrote:
And I'm saying that not to discourage Unicode addition to MicroPython,
but to hint that force-force approach implemented by CPython3 and
causing rage and split in the community is not appreciated.
FWIW, it's Python 3 (the
Hello,
On Wed, 4 Jun 2014 17:03:22 +1000
Chris Angelico ros...@gmail.com wrote:
[]
Why not support variable-width strings like CPython 3.4?
That was my first recommendation, and in fact I started writing code
to implement parts of PEP 393, with a view to basically doing it the
same way
Hello,
On Wed, 4 Jun 2014 12:32:12 +1000
Chris Angelico ros...@gmail.com wrote:
On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano
st...@pearwood.info wrote:
* Having a build-time option to restrict all strings to ASCII-only.
(I think what they mean by that is that strings will be like
Hello,
On Tue, 3 Jun 2014 22:23:07 -0700
Guido van Rossum gu...@python.org wrote:
[]
Never mind disabling assertions -- even with enabled assertions you'd
have to expect most Python programs to fail with non-ASCII input.
Then again the UTF-8 option would be pretty devastating too for
On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky pmis...@gmail.com wrote:
An alternative view is that the discussion on the tracker showed Python
developers' mind-fixation on implementing something the way CPython does
it. And I didn't yet go to that argument, but in the end, MicroPython
does
Can of worms, opened.
On Jun 4, 2014 7:20 AM, Chris Angelico ros...@gmail.com wrote:
On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky pmis...@gmail.com wrote:
An alternative view is that the discussion on the tracker showed Python
developers' mind-fixation on implementing something the way
Hello,
On Wed, 4 Jun 2014 20:53:46 +1000
Chris Angelico ros...@gmail.com wrote:
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky pmis...@gmail.com
wrote:
And I'm saying that not to discourage Unicode addition to
MicroPython, but to hint that force-force approach implemented by
CPython3
-Dev] Internal representation of strings and
Micropython
I think UTF8 is the best option.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options
If we're voting I think representing Unicode internally in micropython
as utf-8 with O(N) indexing is a great idea, partly because I'm not
sure indexing into strings is a good idea - lots of Unicode code
points don't make sense by themselves; see also grapheme clusters. It
would probably work
Hello,
On Wed, 4 Jun 2014 21:17:12 +1000
Chris Angelico ros...@gmail.com wrote:
On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky pmis...@gmail.com
wrote:
An alternative view is that the discussion on the tracker showed
Python developers' mind-fixation on implementing something the way
enough for
O(N) indexing.
Cheers,
Steve
Top-posted from my Windows Phone
From: Daniel Holthmailto:dho...@gmail.com
Sent: 6/4/2014 5:17
To: Paul Sokolovskymailto:pmis...@gmail.com
Cc: python-devmailto:python-dev@python.org
Subject: Re: [Python-Dev] Internal
On 04/06/2014 11:53, Paul Sokolovsky wrote:
Hello,
On Tue, 3 Jun 2014 22:23:07 -0700
Guido van Rossum gu...@python.org wrote:
[]
Never mind disabling assertions -- even with enabled assertions you'd
have to expect most Python programs to fail with non-ASCII input.
Then again the UTF-8 option
On 4 June 2014 15:39, dw+python-...@hmmz.org wrote:
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:
There's a general expectation that indexing will be O(1) because all
the builtin containers that support that syntax use it for O(1) lookup
operations.
Depending on your
04.06.14 04:17, Steven D'Aprano написав(ла):
Would either of these trade-offs be acceptable while still claiming
Python 3.4 compatibility?
My own feeling is that O(1) string indexing operations are a quality of
implementation issue, not a deal breaker to call it a Python. I can't
see any
MicroPython is going to be significantly incompatible with Python
anyway. But you should be able to run your mp code on regular Python.
On Wed, Jun 4, 2014 at 9:39 AM, Serhiy Storchaka storch...@gmail.com wrote:
04.06.14 04:17, Steven D'Aprano написав(ла):
Would either of these trade-offs be
On 4 June 2014 14:39, Serhiy Storchaka storch...@gmail.com wrote:
I think than breaking O(1) expectation for indexing makes the implementation
significant incompatible with Python. Virtually all string operations in
Python operates with indices.
I don't use indexing on strings except in rare
On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote:
I'm agree with Daniel. Directly indexing into text suggests an
attempted optimization that is likely to be incorrect for a set of
strings.
I'm afraid I don't understand this argument. The language semantics says
that a string is
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you don't need
non-ASCII characters to blink an LED or turn a servo, and there is
significant resistance to the notion that appending a non-ASCII
character to a long ASCII-only string requires the whole
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka storch...@gmail.com wrote:
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you don't need
non-ASCII characters to blink an LED or turn a servo, and there is
significant resistance to the notion that
On Wed, Jun 04, 2014 at 01:38:57PM +0300, Paul Sokolovsky wrote:
That's another reason why people don't like Unicode enforced upon them
Enforcing design and language decisions is the job of the programming
language. You might as well complain that Python forces C doubles as the
floating point
Hello,
On Thu, 5 Jun 2014 00:26:10 +1000
Chris Angelico ros...@gmail.com wrote:
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka
storch...@gmail.com wrote:
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you don't need
non-ASCII characters to
On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky pmis...@gmail.com wrote:
But you need non-ASCII characters to display a title of MP3 track.
Yes, but to display a title, you don't need to do codepoint access at
random - you need to either take a block of memory (length in bytes) and
do
04.06.14 17:02, Paul Moore написав(ла):
On 4 June 2014 14:39, Serhiy Storchaka storch...@gmail.com wrote:
I think than breaking O(1) expectation for indexing makes the implementation
significant incompatible with Python. Virtually all string operations in
Python operates with indices.
I don't
On Wed, Jun 4, 2014 at 10:12 AM, Steven D'Aprano st...@pearwood.info wrote:
On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote:
I'm agree with Daniel. Directly indexing into text suggests an
attempted optimization that is likely to be incorrect for a set of
strings.
I'm afraid I
Steven D'Aprano wrote:
The language semantics says that a string is an array of code points. Every
index relates to a single code point, no code point extends over two or more
indexes.
There's a 1:1 relationship between code points and indexes. How is direct
indexing likely to be incorrect?
Hello,
On Wed, 04 Jun 2014 17:40:14 +0300
Serhiy Storchaka storch...@gmail.com wrote:
04.06.14 17:02, Paul Moore написав(ла):
On 4 June 2014 14:39, Serhiy Storchaka storch...@gmail.com wrote:
I think than breaking O(1) expectation for indexing makes the
implementation significant
Paul Sokolovsky wrote:
You just shouldn't write inefficient programs, voila. But if you want, you
can keep writing inefficient programs, they just will be inefficient. Peace.
Can I nominate this for QOTD? :)
Cheers,
Steve
___
Python-Dev mailing list
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise), re.finditer('\\S+',
string) also provides the same behaviour and gives me the sliced string, so
there's no need to index for anything.
Out of idle curiosity is there anything that stops
Hello,
On Thu, 5 Jun 2014 01:00:52 +1000
Chris Angelico ros...@gmail.com wrote:
On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky pmis...@gmail.com
wrote:
But you need non-ASCII characters to display a title of MP3
track.
Yes, but to display a title, you don't need to do codepoint
For Jython and IronPython, UTF-16 may be best internal encoding.
Recent languages (Swiffy, Golang, Rust) chose UTF-8 as internal encoding.
Using utf-8 is simple and efficient. For example, no need for utf-8
copy of the string when writing to file
and serializing to JSON.
When implementing Python
04.06.14 18:38, Paul Sokolovsky написав(ла):
Any non-trivial text parsing uses indices or regular expressions (and
regular expressions themself use indices internally).
I keep hearing this stuff, and unfortunately so far don't have enough
time to collect all that stuff and provide detailed
On 2014-06-04 14:33, Nick Coghlan wrote:
On 4 June 2014 15:39, dw+python-...@hmmz.org wrote:
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:
There's a general expectation that indexing will be O(1) because
all the builtin containers that support that syntax use it for
O(1)
Hello,
On Wed, 04 Jun 2014 19:49:18 +0300
Serhiy Storchaka storch...@gmail.com wrote:
[]
But show me real-world case for that. Common usecase is scanning
string left-to-right, that should be done using iterator and thus
O(N). Right-to-left scanning would be order(s) of magnitude less
04.06.14 19:52, MRAB написав(ла):
In order to avoid indexing, you could use some kind of 'cursor' class to
step forwards and backwards along strings. The cursor could include
both the codepoint index and the byte index.
So you need different string library and different regular expression
04.06.14 17:49, Paul Sokolovsky написав(ла):
On Thu, 5 Jun 2014 00:26:10 +1000
Chris Angelico ros...@gmail.com wrote:
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka
storch...@gmail.com wrote:
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you
04.06.14 20:05, Paul Sokolovsky написав(ла):
On Wed, 04 Jun 2014 19:49:18 +0300
Serhiy Storchaka storch...@gmail.com wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize
don't use iterators. They use indices, str.find and/or regular
expressions. Common use case is quickly
Serhiy Storchaka writes:
It would be interesting to collect a statistic about how many indexing
operations happened during the life of a string in typical (Micro)Python
program.
Probably irrelevant (I doubt anybody is going to be writing
programmers' editors in MicroPython), but by far
This thread has devolved into a flame war. I think we should trust the
Micropython implementers (whoever they are -- are they participating here?)
to know their users and let them do what feels right to them. We should
just ask them not to claim full compatibility with any particular Python
Hello,
On Wed, 04 Jun 2014 20:52:14 +0300
Serhiy Storchaka storch...@gmail.com wrote:
[]
That's sad, I agree.
Other languages (Go, Rust) can be happy without O(1) indexing of
strings. All string and regex operations work with iterators or
cursors, and I believe this approach is not
On Wed, Jun 04, 2014 at 03:32:25PM +, Steve Dower wrote:
Steven D'Aprano wrote:
The language semantics says that a string is an array of code points. Every
index relates to a single code point, no code point extends over two or more
indexes.
There's a 1:1 relationship between code
...@gmail.com
Cc: python-dev mailto:python-dev@python.org
Subject: Re: [Python-Dev] Internal representation of strings and
Micropython
If we're voting I think representing Unicode internally in micropython
as utf-8 with O(N) indexing is a great idea, partly because I'm not
sure indexing into strings
Hello,
On Wed, 4 Jun 2014 11:25:51 -0700
Guido van Rossum gu...@python.org wrote:
This thread has devolved into a flame war. I think we should trust the
Micropython implementers (whoever they are -- are they participating
here?)
I'm a regular contributor. I'm not sure if the author, Damien
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
fundamentally, I mean for those strings that have not yet noticed that
they contain no supplementary (0x) characters.
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
fundamentally, I mean for those strings that have not yet noticed that
they contain no supplementary (0x) characters.
1 - 100 of 127 matches
Mail list logo