Danny Yoo <d...@hashcollision.org> Wrote in message: > > > > >> Hopefully, this makes the point clearer: we must not try to decode >> individual lines. By that time, the damage has been done: the act of >> trying to break the file into lines by looking naively at newline byte >> characters is invalid when certain characters can themselves have >> newline characters. > > Confusing last sentence. Let me try that again. The act of trying to > break the file into lines by looking naively at newline byte > characters is invalid because certain characters, under encoding, > themselves consist of newline characters. We've got to open the file > with the right encoding in play. > >
When the file is encoded, it's a binary file until you decode it. You should never use readline or equivalent on a binary file. Some encodings go out of their way to make it seem to work, but taking advantage of such details leaves you at risk when a new file having a different encoding comes along. -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor