--
Neil Hodgson:
The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string.
Serious developers/typographers/users know that you can not compose
a text in French with latin-1. This is now also
On Sun, 31 Mar 2013 00:35:23 -0700, jmfauth wrote:
This is not really the problem. Serious users may notice sooner or
later, Python and Unicode are walking in opposite directions
(technically and in spirit).
timeit.repeat('a' * 1000 + 'ẞ')
[1.1088995672090292, 1.0842266613261913,
On 31/03/2013 08:35, jmfauth wrote:
--
Neil Hodgson:
The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string.
Serious developers/typographers/users know that you can not compose
a text in
On Thu, Mar 28, 2013 at 8:37 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
I also wonder why the implementation bothers keeping a UTF-8
representation. That sounds like premature optimization to me. Surely
you only need it when writing to a file with UTF-8 encoding? For most
On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly ian.g.ke...@gmail.com wrote:
From the PEP:
A new function PyUnicode_AsUTF8 is provided to access the UTF-8
representation. It is thus identical to the existing
_PyUnicode_AsString, which is removed. The function will compute the
utf8
On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote:
I cannot speak for the borg mind, but for myself a troll is anyone
who continually posts rants (such as RR XL) or who continuously
hijacks threads to talk about their pet peeve (such as jmf).
Assuming jmf actually does care deeply and
On 03/29/2013 07:52 AM, Grant Edwards wrote:
On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote:
I cannot speak for the borg mind, but for myself a troll is anyone
who continually posts rants (such as RR XL) or who continuously
hijacks threads to talk about their pet peeve (such as jmf).
On 2013-03-29, Ethan Furman et...@stoneleaf.us wrote:
On 03/29/2013 07:52 AM, Grant Edwards wrote:
On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote:
I cannot speak for the borg mind, but for myself a troll is anyone
who continually posts rants (such as RR XL) or who continuously
hijacks
On 3/28/2013 10:37 PM, Steven D'Aprano wrote:
Under what circumstances will a string be created from a wchar_t string?
How, and why, would such a string be created? Why would Python still
support strings containing surrogates when it now has a nice, shiny,
surrogate-free flexible
On 03/28/2013 02:31 PM, Ethan Furman wrote:
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
For someone who delights in pointing out the logical errors of
others you are often remarkably sloppy in your own logic.
Of course language can be both
On 03/29/2013 02:26 PM, ru...@yahoo.com wrote:
On 03/28/2013 02:31 PM, Ethan Furman wrote:
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
For someone who delights in pointing out the logical errors of
others you are often remarkably sloppy in your
On 03/27/2013 08:49 PM, rusi wrote:
In particular You are a liar is as bad as You are an idiot
The same statement can be made non-abusively thus: ... is not true
because ...
I don't agree. With all the posts and micro benchmarks and other drivel that jmf has inflicted on us, I find it /very/
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
More seriously Ive never seen anyone -- cause or person -- aided by
the use of excessively strong language.
Of course not. By definition, if it helps, it wasn't *excessively* strong
language.
IOW I repeat my support for Ned's request: Ad
On 28 mar, 07:12, Ethan Furman et...@stoneleaf.us wrote:
On 03/27/2013 08:49 PM, rusi wrote:
In particular You are a liar is as bad as You are an idiot
The same statement can be made non-abusively thus: ... is not true
because ...
I don't agree. With all the posts and micro benchmarks
On 28/03/13 09:03, jmfauth wrote:
The problem is elsewhere. Nobody understand the examples
I gave on this list, because nobody understand Unicode.
These examples are not random examples, they are well
thought.
If you were understanding the coding of the characters,
Unicode and what this
On 28 March 2013 09:03, jmfauth wxjmfa...@gmail.com wrote:
The problem is elsewhere. Nobody understand the examples
I gave on this list, because nobody understand Unicode.
These examples are not random examples, they are well
thought.
There are many people here and among the Python devs who
On Thu, Mar 28, 2013 at 4:20 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
In particular You are a liar is as bad as You are an idiot The same
statement can be made non-abusively thus: ... is not true because ...
I accept that
On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote:
Example of a good Unicode understanding.
If you wish 1) to preserve memory, 2) to cover the whole range
of Unicode, 3) to keep maximum performance while preserving the
good work Unicode.org as done (normalization, sorting),
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not as
efficient as indexing a fixed-length encoding.
Many common string operations do not require indexing by character
which reduces the impact of this inefficiency. UTF-8 seems like a
reasonable choice for an
On 28/03/2013 03:18, Ethan Furman wrote:
I wouldn't call it unproductive -- a half-dozen amusing posts followed
because of Mark's initial post, and they were a great relief from the
tedium and (dare I say it?) idiocy of jmf's posts.
--
~Ethan~
Thanks for those words. They're a tonic as I've
On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote:
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not as
efficient as indexing a fixed-length encoding.
Many common string operations do not require indexing by character
which reduces the impact of this
On 28 mar, 11:30, Chris Angelico ros...@gmail.com wrote:
On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote:
-
You really REALLY need to sort out in your head the difference between
correctness and performance. I still haven't seen one single piece of
evidence from you
On 28 mar, 14:01, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote:
Ian Foote:
One benefit of
UTF-8 over Python's flexible representation is that it is, on average,
more compact over a wide set of samples.
Sure. And
On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote:
This flexible string representation is so absurd that not only
it does not know you can not write Western European Languages
with latin-1, it penalizes you by just attempting to optimize
latin-1. Shown in my multiple examples.
On 28/03/2013 12:11, Neil Hodgson wrote:
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not
as efficient as indexing a fixed-length encoding.
Many common string operations do not require indexing by character
which reduces the impact of this inefficiency. UTF-8
On Fri, Mar 29, 2013 at 1:51 AM, MRAB pyt...@mrabarnett.plus.com wrote:
On 28/03/2013 12:11, Neil Hodgson wrote:
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not
as efficient as indexing a fixed-length encoding.
Many common string operations do not require
On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote:
On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote:
This flexible string representation is so absurd that not only
it does not know you can not write Western European Languages
with latin-1, it penalizes you by just
On Fri, Mar 29, 2013 at 2:14 AM, jmfauth wxjmfa...@gmail.com wrote:
As long as you are attempting to devide a set of characters in
chunks and try to handle them seperately, it will never work.
Okay. Let's look at integers. To properly represent the Python 3 'int'
type (or the Python 2 'long'),
On 28 mar, 16:14, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote:
On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote:
This flexible string representation is so absurd that not only
it does not know you can not write
On Thu, Mar 28, 2013 at 7:01 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
Any string method that takes a starting offset requires the method to
walk the string byte-by-byte. I've even seen languages put responsibility
for dealing with that onto the programmer: the start
On 3/28/2013 10:38 AM, Chris Angelico wrote:
PEP393 strings have two optimizations, or kinda three:
1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else
Options 1a and 1b are almost identical - I'm not sure what the detail
is, but there's something flagging
On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico ros...@gmail.com wrote:
PEP393 strings have two optimizations, or kinda three:
1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else
Options 1a and 1b are almost identical - I'm not sure what the detail
is,
On Fri, Mar 29, 2013 at 3:01 AM, Terry Reedy tjre...@udel.edu wrote:
On 3/28/2013 10:38 AM, Chris Angelico wrote:
PEP393 strings have two optimizations, or kinda three:
1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else
Options 1a and 1b are almost
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
The flexible string representation takes the problem from the
other side, it attempts to work with the characters by using
their representations and it (can only) fails...
This is false. As I've pointed out to you before, the
Chris,
Your problem with int/long, the start of this thread, is
very intersting.
This is not a demonstration, a proof, rather an illustration.
Assume you have a set of integers {0...9} and an operator,
let say, the addition.
Idea.
Just devide this set in two chunks, {0...4} and {5...9}
and
On Fri, Mar 29, 2013 at 3:55 AM, jmfauth wxjmfa...@gmail.com wrote:
Assume you have a set of integers {0...9} and an operator,
let say, the addition.
Idea.
Just devide this set in two chunks, {0...4} and {5...9}
and work hardly to optimize the addition of 2 operands in
the sets {0...4}.
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
The flexible string representation takes the problem from the
other side, it attempts to work with the characters by using
their representations and it (can only)
On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote:
If Python had imlemented Unicode correctly, there would
be no difference in using an a, é, € or any character,
what the narrow builds did.
I'm not following your grammar perfectly here, but if Python were
implementing Unicode
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
More seriously Ive never seen anyone -- cause or person -- aided by
the use of excessively strong language.
Of course not. By definition, if it helps, it wasn't *excessively* strong
language.
In article
captjjmozdhsmuqx7vcpuii2bwrcnzcx76pm-6unb1duq4do...@mail.gmail.com,
Chris Angelico ros...@gmail.com wrote:
I'd rather this list have some vinegar than it devolve into
uselessness. Or, worse, if there's a hard-and-fast rule about
courtesy, devolve into aspartame... everyone's
On 28 mar, 18:55, Chris Angelico ros...@gmail.com wrote:
On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote:
If Python had imlemented Unicode correctly, there would
be no difference in using an a, é, € or any character,
what the narrow builds did.
I'm not following your
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
The flexible string representation takes the problem from the
other side, it attempts to work with
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
For someone who delights in pointing out the logical errors
of others you are often remarkably sloppy in your own logic.
Of course language can be both
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote:
The flexible string
On 28 mar, 22:11, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM,
On Fri, Mar 29, 2013 at 7:26 AM, jmfauth wxjmfa...@gmail.com wrote:
The wide build (I never used) is in my mind as correct as
the narrow build. It just covers a different range in unicode
(the whole range).
Actually it does; it covers all of the Unicode range, by using
(effectively) UTF-16.
On 28/03/2013 21:11, jmfauth wrote:
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com
On Thu, Mar 28, 2013 at 2:11 PM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote:
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote:
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Mar 28, 2013 at 7:34 AM,
Chris Angelico於 2013年3月28日星期四UTC+8上午11時40分17秒寫道:
On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman et...@stoneleaf.us wrote:
Has anybody else thought that [jmf's] last few responses are starting to
sound
bot'ish?
Yes, I did wonder. It's like he and Dihedral have been trading
On 3/28/2013 4:26 PM, jmfauth wrote:
Please provide references for your assertions. I have read the unicode
standard, parts more than once, and your assertions contradict my memory.
Unicode does not stipulate, one has to cover the whole range.
I believe it does. As I remember, the
On Fri, Mar 29, 2013 at 10:53 AM, Dennis Lee Bieber
wlfr...@ix.netcom.com wrote:
On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman et...@stoneleaf.us
declaimed the following in gmane.comp.python.general:
At some point we have to stop being gentle / polite / politically correct
and call a
On 28/03/2013 23:53, Dennis Lee Bieber wrote:
On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman et...@stoneleaf.us
declaimed the following in gmane.comp.python.general:
At some point we have to stop being gentle / polite / politically correct and
call a shovel a shovel... er, spade.
On Thu, 28 Mar 2013 10:11:59 -0600, Ian Kelly wrote:
On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico ros...@gmail.com
wrote:
PEP393 strings have two optimizations, or kinda three:
1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else
Options 1a and 1b are
On Thu, 28 Mar 2013 12:54:20 -0700, rurpy wrote:
Even if you personally would prefer someone to respond by calling you a
liar, your personal preferences do not form a basis for desirable
posting behavior here.
Whereas yours apparently are.
Thanks for the feedback, I'll take it under
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
strings. It's only strings in the SMPs that could need surrogate pairs,
and they don't need them in Python's implementation since
On 29/03/2013 00:54, Chris Angelico wrote:
Minor nitpick, btw:
(in which cast wstr_length differs form length)
Should be in which case and from. Who has the power to correct
typos in PEPs?
ChrisA
Sneak it in here? http://bugs.python.org/issue13604
--
If you're using GoogleCrap™ please
On Fri, Mar 29, 2013 at 12:03 PM, Mark Lawrence breamore...@yahoo.co.uk wrote:
On 29/03/2013 00:54, Chris Angelico wrote:
Minor nitpick, btw:
(in which cast wstr_length differs form length)
Should be in which case and from. Who has the power to correct
typos in PEPs?
Sneak it in here?
On 29/03/2013 00:54, Chris Angelico wrote:
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
strings. It's only strings in the SMPs that could need surrogate pairs,
and they don't
On Fri, 29 Mar 2013 11:54:41 +1100, Chris Angelico wrote:
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
strings. It's only strings in the SMPs that could need surrogate pairs,
On Fri, Mar 29, 2013 at 1:37 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
Under what circumstances will a string be created from a wchar_t string?
How, and why, would such a string be created? Why would Python still
support strings containing surrogates when it now has a
Steven D'Aprano:
Some string operations need to inspect every character, e.g. str.upper().
Even for them, the increased complexity of a variable-width encoding
costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or
4 bytes per character. You have to walk the string grabbing 1
MRAB:
Implementing the regex module (http://pypi.python.org/pypi/regex) would
have been more difficult if the internal representation had been UTF-8,
because of the need to decode, and the implementation would also have
been slower for that reason.
One way to build regex support for UTF-8
On 03/28/2013 08:34 PM, Neil Hodgson wrote:
Steven D'Aprano:
Any string method that takes a starting offset requires the method to
walk the string byte-by-byte. I've even seen languages put responsibility
for dealing with that onto the programmer: the start offset is given in
*bytes*, not
On Fri, Mar 29, 2013 at 2:34 PM, Neil Hodgson nhodg...@iinet.net.au wrote:
It doesn't horrify me - I've been working this way for over 10 years and
it seems completely natural. You can wrap access in iterators that hide the
byte offsets if you like. This then ensures that all operations on
Chris Angelico:
But both this and your example of case conversion are, fundamentally,
iterating over the string. What if you aren't doing that? What if you
want to parse and process?
Parsing is also normally a scanning operation. If you want to
process pieces of the string based on the
On 03/27/2013 06:47 PM, Steven D'Aprano wrote:
On Wed, 27 Mar 2013 11:51:07 +, Mark Lawrence defending an
unproductive post flaming a troll:
I wouldn't call it unproductive -- a half-dozen amusing posts followed because of Mark's initial post, and they were a
great relief from the tedium
On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman et...@stoneleaf.us wrote:
Has anybody else thought that [jmf's] last few responses are starting to sound
bot'ish?
Yes, I did wonder. It's like he and Dihedral have been trading
accounts sometimes. Hey, Dihedral, I hear there's a discussion of
On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote:
So long as Mark doesn't start cussing and swearing I'm not going to get
worked up about it. I
find jmf's posts for more aggravating.
I support Ned's original gentle reminder -- Please be civil
irrespective of surrounding nonsensical
On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote:
So long as Mark doesn't start cussing and swearing I'm not going to get
worked up about it. I find jmf's posts for more aggravating.
I support Ned's original gentle reminder --
On Mar 28, 10:20 am, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote:
So long as Mark doesn't start cussing and swearing I'm not going to get
worked up about it. I find
70 matches
Mail list logo