Re: hex dump w/ or w/out utf-8 chars

2013-07-24 Thread wxjmfauth
I do not find the thread, where a Python core dev spoke about French, so I'm putting here. This stupid Flexible String Representation splits Unicode in chunks and one of these chunks is latin-1 (iso-8859-1). If we consider that latin-1 is unusable for 17 (seventeen) European languages based on th

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Chris Angelico
On Mon, Jul 15, 2013 at 2:18 PM, Terry Reedy wrote: > On 7/14/2013 10:56 AM, Chris Angelico wrote: > As issue about finding stings in strings was opened last September and, as > reported on this list, fixes were applied about last March. As I remember, > some but not all of the optimizations were

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Terry Reedy
On 7/14/2013 10:56 AM, Chris Angelico wrote: On Sun, Jul 14, 2013 at 11:44 PM, wrote: timeit.repeat("a = 'hundred'; 'x' in a") [0.11785943134991479, 0.09850454944486256, 0.09761604599423179] timeit.repeat("a = 'hundreœ'; 'x' in a") [0.23955250303158593, 0.2195812612416752, 0.2213389699740

Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Chris Angelico
On Sun, Jul 14, 2013 at 11:44 PM, wrote: > Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit : >> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote: >> >> >> >> > For a very simple reason, the latin-1 block: considered and accepted >> >> > today as beeing a Unicode design mist

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth
Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit : > On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote: > > > > > For a very simple reason, the latin-1 block: considered and accepted > > > today as beeing a Unicode design mistake. > > > > Latin-1 (also known as ISO-8859-

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread Steven D'Aprano
On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote: > For a very simple reason, the latin-1 block: considered and accepted > today as beeing a Unicode design mistake. Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational Character Set", which goes back to 1983. ISO-8859-1 was fir

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth
Le samedi 13 juillet 2013 21:02:24 UTC+2, Dave Angel a écrit : > On 07/13/2013 10:37 AM, wxjmfa...@gmail.com wrote: > > > > > > Fortunately for us, Python (in version 3.3 and later) and Pike did it > > right. Some day the others may decide to do similarly. > > > --- Possible but

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Neil Hodgson
wxjmfa...@gmail.com: The FSR is naive and badly working. I can not force people to understand the coding of the characters [*]. You could at least *try*. If there really was a problem with the FSR and you truly understood this problem then surely you would be able to communicate the pr

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Dave Angel
On 07/13/2013 10:37 AM, wxjmfa...@gmail.com wrote: The FSR is naive and badly working. I can not force people to understand the coding of the characters [*]. That would be very hard, since you certainly do not. I'm the first to recognize that Python and/or Pike are free to do what they wis

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread wxjmfauth
Le samedi 13 juillet 2013 11:49:10 UTC+2, Steven D'Aprano a écrit : > On Sat, 13 Jul 2013 00:56:52 -0700, wxjmfauth wrote: > > > > > You are confusing the knowledge of a coding scheme and the intrisinc > > > information a "coding scheme" *may* have, in a mandatory way, to work > > > properly.

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Chris Angelico
On Sat, Jul 13, 2013 at 7:49 PM, Steven D'Aprano wrote: > Ironically, Python has done the same thing for integers for many versions > too. They just didn't call it "Flexible Integer Representation", but > that's what it is. For integers smaller than 2**31, they are stored as C > longs (plus object

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Steven D'Aprano
On Sat, 13 Jul 2013 00:56:52 -0700, wxjmfauth wrote: > You are confusing the knowledge of a coding scheme and the intrisinc > information a "coding scheme" *may* have, in a mandatory way, to work > properly. These are conceptualy two different things. *May* have, in a *mandatory* way? JMF, I kno

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Chris Angelico
On Sat, Jul 13, 2013 at 5:56 PM, wrote: > Try to write an editor, a text widget, with with a coding > scheme like the Flexible String Represenation. You will > quickly notice, it is impossible (understand correctly). > (You do not need a computer, just a sheet of paper and a pencil) > Hint: what

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Steven D'Aprano
On Sat, 13 Jul 2013 00:56:52 -0700, wxjmfauth wrote: > I am convinced you are not conceptually understanding utf-8 very well. I > wrote many times, "utf-8 does not produce bytes, but Unicode Encoding > Units". Just because you write it many times, doesn't make it correct. You are simply wrong. U

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread Lele Gaifax
wxjmfa...@gmail.com writes: > Try to write an editor, a text widget, with with a coding > scheme like the Flexible String Represenation. You will > quickly notice, it is impossible (understand correctly). > (You do not need a computer, just a sheet of paper and a pencil) > Hint: what is the charac

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread wxjmfauth
Le vendredi 12 juillet 2013 04:16:21 UTC+2, Chris Angelico a écrit : > On Fri, Jul 12, 2013 at 4:42 AM, wrote: > > > BTW, since > > > when a serious coding scheme need an extermal marker? > > > > > > > All of them. > > > > Content-type: text/plain; charset=UTF-8 > > > > ChrisA --

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread Steven D'Aprano
On Fri, 12 Jul 2013 23:01:47 +0100, Joshua Landau wrote: > Isn't a superscript "c" the symbol for radians? Only in the sense that a superscript "o" is the symbol for degrees. Semantically, both degree-sign and radian-sign are different "things" than merely an o or c in superscript. Neverthele

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread Tim Roberts
Joshua Landau wrote: > >Isn't a superscript "c" the symbol for radians? That's very rarely used. More common is "rad". The problem with a superscript "c" is that it looks too much like a degree symbol. -- Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc. -- http://mail.python.org/mai

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread Joshua Landau
On 9 July 2013 10:34, wrote: > There is no symbole for radian because mathematically > radian is a pure number, a unitless number. You can > hower sepecify a = ... in radian (rad). > Isn't a superscript "c" the symbol for radians? -- http://mail.python.org/mailman/listinfo/python-list

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread wxjmfauth
Le vendredi 12 juillet 2013 05:18:44 UTC+2, Steven D'Aprano a écrit : > On Thu, 11 Jul 2013 11:42:26 -0700, wxjmfauth wrote: > > > Now all your strings will be just as heavy, every single variable name > > and attribute name will use four times as much memory. Happy now? > >>> 㑖 = 9

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread Chris Angelico
On Fri, Jul 12, 2013 at 4:42 AM, wrote: > BTW, since > when a serious coding scheme need an extermal marker? > All of them. Content-type: text/plain; charset=UTF-8 ChrisA -- http://mail.python.org/mailman/listinfo/python-list

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread Steven D'Aprano
On Thu, 11 Jul 2013 11:42:26 -0700, wxjmfauth wrote: > And what to say about this "ucs4" char/string '\U0001d11e' which is > weighting 18 bytes more than an "a". > sys.getsizeof('\U0001d11e') > 44 > > A total absurdity. You should stick to Python 3.1 and 3.2 then: py> print(sys.version)

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit : > On Thu, Jul 11, 2013 at 11:18 PM, wrote: > > > Just to stick with this funny character ẞ, a ucs-2 char > > > in the Flexible String Representation nomenclature. > > > > > > It seems to me that, when one needs more than ten by

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le jeudi 11 juillet 2013 20:42:26 UTC+2, wxjm...@gmail.com a écrit : > Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit : > > > On Thu, Jul 11, 2013 at 11:18 PM, wrote: > > > > > > > Just to stick with this funny character ẞ, a ucs-2 char > > > > > > > in the Flexible String

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread Chris Angelico
On Thu, Jul 11, 2013 at 11:18 PM, wrote: > Just to stick with this funny character ẞ, a ucs-2 char > in the Flexible String Representation nomenclature. > > It seems to me that, when one needs more than ten bytes > to encode it, > sys.getsizeof('a') > 26 sys.getsizeof('ẞ') > 40 > > this

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le lundi 8 juillet 2013 19:52:17 UTC+2, Chris Angelico a écrit : > On Tue, Jul 9, 2013 at 3:31 AM, wrote: > > > Unfortunately (as probably I told you before) I will never pass to > > > Python 3... Guido should not always listen only to gurus like him... > > > I don't like Python as before...s

Re: hex dump w/ or w/out utf-8 chars

2013-07-10 Thread wxjmfauth
For those who are interested. The official proposal request for the encoding of the Latin uppercase letter Sharp S in ISO/IEC 10646; DIN (The German Institute for Standardization) proposal is available on the web. A pdf with the rationale. I do not remember from where I got it, probably from a Germ

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Steven D'Aprano
On Tue, 09 Jul 2013 12:15:29 +0200, Chris “Kwpolska” Warrick wrote: > On Tue, Jul 9, 2013 at 11:34 AM, wrote: >> Note the difference between SS and ẞ 'FRANZ-JOSEF-STRAUSS-STRAẞE' > > This is a capital Eszett. Which just happens not to exist in German. > Germans do not use this character, it is

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Chris “Kwpolska” Warrick
On Tue, Jul 9, 2013 at 11:34 AM, wrote: > Note the difference between SS and ẞ > 'FRANZ-JOSEF-STRAUSS-STRAẞE' This is a capital Eszett. Which just happens not to exist in German. Germans do not use this character, it is not available on German keyboards, and the German spelling rules have you r

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Dave Angel
On 07/09/2013 09:00 AM, Neil Cerutti wrote: Interestingly similar scheme. It wonder if 5-bit chars was a common compression scheme. The Z-machine spec was never officially published either. I believe a "task force" reverse engineered it sometime in the 90's. Baudot was 5 bits. It used s

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Skip Montanaro
> It wonder if 5-bit chars was a > common compression scheme. http://en.wikipedia.org/wiki/List_of_binary_codes Baudot was pretty common, as I recall, though ASCII and EBCDIC ruled by the time I started punching cards. Skip -- http://mail.python.org/mailman/listinfo/python-list

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Neil Cerutti
On 2013-07-09, Dave Angel wrote: >> One of the first Python project I undertook was a program to >> dump the ZSCII strings from Infocom game files. They are >> mostly packed one character per 5 bits, with escapes to (I had >> to recheck the Z-machine spec) latin-1. Oh, those clever >> implementors

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Dave Angel
On 07/09/2013 08:22 AM, Neil Cerutti wrote: On 2013-07-08, Dave Angel wrote: I appreciate you've been around a long time, and worked in a lot of languages. I've programmed professionally in at least 35 languages since 1967. But we've come a long way from the 6bit characters I used in 1968. A

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Neil Cerutti
On 2013-07-08, Dave Angel wrote: > I appreciate you've been around a long time, and worked in a > lot of languages. I've programmed professionally in at least > 35 languages since 1967. But we've come a long way from the > 6bit characters I used in 1968. At that time, we packed them > 10 charac

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread wxjmfauth
Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit : > On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote: > > > > > Not using python 3, for me (a programmer which was present at the > > > beginning of computer science, badly interacting with many languages > > > from assembl

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread Steven D'Aprano
On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote: > Not using python 3, for me (a programmer which was present at the > beginning of computer science, badly interacting with many languages > from assembler to Fortran and from c to Pascal and so on) it was an hard > job to arrange the abrupt

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Steven D'Aprano
On Tue, 09 Jul 2013 07:49:45 +1000, Chris Angelico wrote: > On Tue, Jul 9, 2013 at 6:56 AM, Dave Angel wrote: >> But Unicode has nothing to do with Guido, and it has existed for about >> 25 years (if I recall correctly). > > Depends how you measure. According to [1], the work kinda began back >

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Steven D'Aprano
On Tue, 09 Jul 2013 00:32:00 +0100, MRAB wrote: > On 08/07/2013 23:02, Joshua Landau wrote: >> On 8 July 2013 22:38, MRAB wrote: >>> On 08/07/2013 21:56, Dave Angel wrote: Characters do not have a width. >>> >>> [snip] >>> >>> It depends what you mean by "width"! :-) >>> >>> Try this (Python

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread MRAB
On 08/07/2013 23:02, Joshua Landau wrote: On 8 July 2013 22:38, MRAB wrote: On 08/07/2013 21:56, Dave Angel wrote: Characters do not have a width. [snip] It depends what you mean by "width"! :-) Try this (Python 3): print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}") AA Serious question: H

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Chris Angelico
On Tue, Jul 9, 2013 at 8:45 AM, Dave Angel wrote: > On 07/08/2013 05:49 PM, Chris Angelico wrote: >> >> On Tue, Jul 9, 2013 at 6:56 AM, Dave Angel wrote: >>> >>> But Unicode has nothing to do with Guido, and it has existed for about 25 >>> years (if I recall correctly). >> >> >> Depends how you m

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Dave Angel
On 07/08/2013 05:49 PM, Chris Angelico wrote: On Tue, Jul 9, 2013 at 6:56 AM, Dave Angel wrote: But Unicode has nothing to do with Guido, and it has existed for about 25 years (if I recall correctly). Depends how you measure. According to [1], the work kinda began back then (25 years ago bein

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Joshua Landau
On 8 July 2013 22:38, MRAB wrote: > On 08/07/2013 21:56, Dave Angel wrote: >> Characters do not have a width. > > [snip] > > It depends what you mean by "width"! :-) > > Try this (Python 3): > print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}") > AA Serious question: How would one find the width

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Chris Angelico
On Tue, Jul 9, 2013 at 6:56 AM, Dave Angel wrote: > But Unicode has nothing to do with Guido, and it has existed for about 25 > years (if I recall correctly). Depends how you measure. According to [1], the work kinda began back then (25 years ago being 1988), but it wasn't till 1991/92 that the s

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread MRAB
On 08/07/2013 21:56, Dave Angel wrote: On 07/08/2013 01:53 PM, ferdy.blat...@gmail.com wrote: Hi Steven, thank you for your reply... I really needed another python guru which is also an English teacher! Sorry if English is not my mother tongue... "uncorrect" instead of "incorrect" (I misapplied

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Dave Angel
On 07/08/2013 01:53 PM, ferdy.blat...@gmail.com wrote: Hi Steven, thank you for your reply... I really needed another python guru which is also an English teacher! Sorry if English is not my mother tongue... "uncorrect" instead of "incorrect" (I misapplied the "similarity principle" like "unplea

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Chris Angelico
On Tue, Jul 9, 2013 at 3:53 AM, wrote: >>> All characters are UTF-8, characters. "a" is a UTF-8 character. So is "ă". > Not using python 3, for me (a programmer which was present at the beginning of > computer science, badly interacting with many languages from assembler to > Fortran and from c t

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread ferdy . blatsco
Hi Steven, thank you for your reply... I really needed another python guru which is also an English teacher! Sorry if English is not my mother tongue... "uncorrect" instead of "incorrect" (I misapplied the "similarity principle" like "unpleasant...>...uncorrect"). Apart from these trifles, you sa

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread Chris Angelico
On Tue, Jul 9, 2013 at 3:31 AM, wrote: > Unfortunately (as probably I told you before) I will never pass to > Python 3... Guido should not always listen only to gurus like him... > I don't like Python as before...starting from OOP and ending with codecs > like utf-8. Regarding OOP, much apprecia

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread ferdy . blatsco
Hi Chris, glad to have received your contribution, but I was expecting much more critics... Starting from the "little nitpick" about the comment dispositon in my script... you are correct... It is a bad habit on my part to place variables subjected to change at the beginning of the script... and t

Re: hex dump w/ or w/out utf-8 chars

2013-07-07 Thread Steven D'Aprano
re not a guru The script > seems very long but I commented too much ... sorry. It is very useful > (at least IMHO...) > It works under Linux. but there is still a little problem which I didn't > solve (at least programmatically...). > > > # -*- coding: utf-8 -*- >

Re: hex dump w/ or w/out utf-8 chars

2013-07-07 Thread Chris Angelico
t; As I already told to Chris... critics are welcome! No problem. > # -*- coding: utf-8 -*- > # px.py vers. 11 (pxb.py) # python 2.6.6 > # hex-dump w/ or w/out utf-8 chars > # Using spaces as separators, this script shows > # (better than tabnanny) uncorrect indentations. > &

hex dump w/ or w/out utf-8 chars

2013-07-07 Thread blatt
ump w/ or w/out utf-8 chars # Using spaces as separators, this script shows # (better than tabnanny) uncorrect indentations. # to save output > python pxb.py hex.txt > px9_out_hex.txt nLenN=3 # n. of digits for lines # version almost thoroughly rewritten on the ground of # the