[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver
Joseph Copenhaver added the comment: It is better, thanks. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: h

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Is there any way to get the efficiency of codecs I/O readlines() > chunking behavior and specify a list of characters to use? Can the > file delimiter be changed in python as in perl? No, but you can use readlines() from the standard open() function (which wi

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver
Joseph Copenhaver added the comment: I now recognize the issue was in regard to format problems and not python, but the area where this code will be used requires the use of the codecs module. Is there any way to get the efficiency of codecs I/O readlines() chunking behavior and specify a list

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: U+0085 corresponds to a line terminator (*). and codecs.open() observes this convention. Do note that the new io.open() (or the built-in open() in 3.x) only recognizes '\r' and '\n' as line separators. In any case, changing this behaviour would break compatib

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Ezio Melotti
Changes by Ezio Melotti : -- components: -Regular Expressions nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver
New submission from Joseph Copenhaver : The IO readlines() facility incorrectly processes utf8 files for some unknown reason. Specifically, the call generates too many entries in the lines array result after a character sequence "\x85 blah" which gets cut as ("\x85 ","blah") according the the