Re: harmful str(bytes)

2010-10-12 Thread Hrvoje Niksic
Stefan Behnel writes: > Hallvard B Furuseth, 11.10.2010 23:45: >> If there were a __plain_str__() method which was supposed to fail rather >> than start to babble Python syntax, and if there were not plenty of >> Python code around which invoked __str__, I'd agree. > > Yes, calling str() "just in

Re: harmful str(bytes)

2010-10-11 Thread Stefan Behnel
Hallvard B Furuseth, 11.10.2010 23:45: If there were a __plain_str__() method which was supposed to fail rather than start to babble Python syntax, and if there were not plenty of Python code around which invoked __str__, I'd agree. Yes, calling str() "just in case" has a clear code smell. I th

Re: harmful str(bytes)

2010-10-11 Thread Antoine Pitrou
On Mon, 11 Oct 2010 21:50:32 +0200 Hallvard B Furuseth wrote: > > I'd just posted an example in article : > > urllib.parse.urlunparse(('', '', '/foo', b'bar', '', '')) returns > "/foo;b'bar'" instead of raising an exception or returning 2.6's correct > "/foo;bar". Oh, this looks like a bug in u

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth
Stefan Behnel writes: >Hallvard B Furuseth, 11.10.2010 21:50: >> Fine, so programs will have to do it themselves... > > Yes, they can finally handle bytes and Unicode data correctly and > safely. Having byte data turn into Unicode strings unexpectedly makes > the behaviour of your code hardly predi

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth
Terry Reedy writes: >On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote: >>> Actually, the implicit contract of __str__ is that it never fails, so >>> that everything can be printed out (for debugging purposes, etc.). >> >> Nope: >> >> $ python2 -c 'str(u"\u1000")' >> Traceback (most recent call last)

Re: harmful str(bytes)

2010-10-11 Thread Stefan Behnel
Hallvard B Furuseth, 11.10.2010 21:50: Antoine Pitrou writes: 2) some unicode objects didn't have a succesful str() Python 3 fixes both these issues. Fixing 1) means there's no automatic coercion when trying to mix bytes and unicode. Fine, so programs will have to do it themselves... Yes, t

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth
Antoine Pitrou writes: >Hallvard B Furuseth wrote: >>Antoine Pitrou writes: >>>Hallvard B Furuseth wrote: The offender is bytes.__str__: str(b'foo') == "b'foo'". It's often not clear from looking at a piece of code whether some data is treated as strings or bytes, particularly when

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth
Terry Reedy writes: >On 10/8/2010 9:31 AM, Hallvard B Furuseth wrote: >> That's not the point - the point is that for 2.* code which _uses_ str >> vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the >> same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So >> upgraded

Re: harmful str(bytes)

2010-10-08 Thread Terry Reedy
On 10/8/2010 9:31 AM, Hallvard B Furuseth wrote: That's not the point - the point is that for 2.* code which _uses_ str vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So upgraded old code will have to expect

Re: harmful str(bytes)

2010-10-08 Thread Terry Reedy
On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote: Actually, the implicit contract of __str__ is that it never fails, so that everything can be printed out (for debugging purposes, etc.). Nope: $ python2 -c 'str(u"\u1000")' Traceback (most recent call last): File "", line 1, in ? UnicodeEnco

Re: harmful str(bytes)

2010-10-08 Thread Antoine Pitrou
On Fri, 08 Oct 2010 15:45:58 +0200 Hallvard B Furuseth wrote: > Antoine Pitrou writes: > >Hallvard B Furuseth wrote: > >> The offender is bytes.__str__: str(b'foo') == "b'foo'". > >> It's often not clear from looking at a piece of code whether > >> some data is treated as strings or bytes, partic

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth
Steven D'Aprano writes: >On Fri, 08 Oct 2010 15:31:27 +0200, Hallvard B Furuseth wrote: >> That's not the point - the point is that for 2.* code which _uses_ str >> vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the same >> way - a 2.* 'str' will sometimes be 3.* bytes, sometime st

Re: harmful str(bytes)

2010-10-08 Thread Steven D'Aprano
On Fri, 08 Oct 2010 15:31:27 +0200, Hallvard B Furuseth wrote: > Arnaud Delobelle writes: >>Hallvard B Furuseth writes: >>> I've been playing a bit with Python3.2a2, and frankly its charset >>> handling looks _less_ safe than in Python 2. (...) >>> With 2. conversion Unicode <-> string the equiva

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth
Antoine Pitrou writes: >Hallvard B Furuseth wrote: >> The offender is bytes.__str__: str(b'foo') == "b'foo'". >> It's often not clear from looking at a piece of code whether >> some data is treated as strings or bytes, particularly when >> translating from old code. Which means one cannot see fro

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth
Arnaud Delobelle writes: >Hallvard B Furuseth writes: >> I've been playing a bit with Python3.2a2, and frankly its charset >> handling looks _less_ safe than in Python 2. >> (...) >> With 2. conversion Unicode <-> string the equivalent operation did >> not silently produce garbage: it raised Unico

Re: harmful str(bytes)

2010-10-08 Thread Antoine Pitrou
On Thu, 07 Oct 2010 23:33:35 +0200 Hallvard B Furuseth wrote: > > The offender is bytes.__str__: str(b'foo') == "b'foo'". > It's often not clear from looking at a piece of code whether > some data is treated as strings or bytes, particularly when > translating from old code. Which means one cann

Re: harmful str(bytes)

2010-10-07 Thread Arnaud Delobelle
Hallvard B Furuseth writes: > I've been playing a bit with Python3.2a2, and frankly its charset > handling looks _less_ safe than in Python 2. > > The offender is bytes.__str__: str(b'foo') == "b'foo'". > It's often not clear from looking at a piece of code whether > some data is treated as strin

harmful str(bytes)

2010-10-07 Thread Hallvard B Furuseth
I've been playing a bit with Python3.2a2, and frankly its charset handling looks _less_ safe than in Python 2. The offender is bytes.__str__: str(b'foo') == "b'foo'". It's often not clear from looking at a piece of code whether some data is treated as strings or bytes, particularly when translatin