[issue18291] codecs.open interprets space as line ends

2013-07-04 Thread Paul
Paul added the comment: Right, #7643 indeed seems to be exactly about the issue I described here (for as much as I know unicode which isn't all that much). So maybe they should be merged. The issue was closed March 2010, is that after 2.7.3 was released? By the way, where I wrote \x12, \x13, \

[issue18291] codecs.open interprets space as line ends

2013-07-01 Thread R. David Murray
R. David Murray added the comment: There are two issues that I could find related to these characters, one of them still open: #18236 and #7643. The latter contains a fairly complete discussion of the underlying issue, but on a quick read through it is not clear to me if the linebreak issue

[issue18291] codecs.open interprets space as line ends

2013-07-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: In contrary to documentation str.splitlines() splits lines not only on '\n', '\r\n' and '\r'. >>> 'a'.join(chr(i) for i in range(32)).splitlines(True) ['\x00a\x01a\x02a\x03a\x04a\x05a\x06a\x07a\x08a\ta\n', 'a\x0b', 'a\x0c', 'a\r', 'a\x0ea\x0fa\x10a\x11a\x12a

[issue18291] codecs.open interprets space as line ends

2013-06-25 Thread STINNER Victor
STINNER Victor added the comment: >> So I guess there is little interest in fixing codecs because io is the >> preferred package for reading unicode files. > I guess Victor have an interest. ;) Ah ah, good joke. I wrote the PEP 400: http://www.python.org/dev/peps/pep-0400/ And yes, for best pe

[issue18291] codecs.open interprets space as line ends

2013-06-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I guess Victor have an interest. ;) -- versions: +Python 3.3, Python 3.4 ___ Python tracker ___ __

[issue18291] codecs.open interprets space as line ends

2013-06-25 Thread Paul
Paul added the comment: You're absolutely right. I tested it on another machine now, with Python 2.7.3 installed and it is actually twice as fast as codecs. Thanks. So I guess there is little interest in fixing codecs because io is the preferred package for reading unicode files. --

[issue18291] codecs.open interprets space as line ends

2013-06-24 Thread R. David Murray
R. David Murray added the comment: Is the "slower" test on 2.6? io would definitely be slower there, since it is pure python. 2.7 has the C accelerated version. -- nosy: +r.david.murray ___ Python tracker __

[issue18291] codecs.open interprets space as line ends

2013-06-24 Thread Paul
Paul added the comment: Sorry for bringing that up as I suppose it is unrelated to the bug I am reporting, but you can an example file attached with timings. -- Added file: http://bugs.python.org/file30688/codecs-io-example.py ___ Python tracker

[issue18291] codecs.open interprets space as line ends

2013-06-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you please provide an example which exposes slowness of io.open() by comparison with codecs.open(). -- nosy: +belopolsky, doerwalter, haypo, lemburg, serhiy.storchaka versions: -Python 2.6 ___ Python tracker

[issue18291] codecs.open interprets space as line ends

2013-06-24 Thread Paul
New submission from Paul: I hope I am writing in the right place. When using codecs.open with UTF-8 encoding, it seems characters \x12, \x13, and \x14 are interpreted as end-of-line. Example code: >>> with open('unicodetest.txt', 'w') as f: >>> f.write('a'+chr(28)+'b'+chr(29)+'c'+chr(30)+'d