Re: Detecting line endings

2006-02-08 Thread Fuzzyman
Fuzzyman wrote: > Alex Martelli wrote: > > Fuzzyman <[EMAIL PROTECTED]> wrote: > >... > > > > Open the file with 'rU' mode, and check the file object's newline > > > > attribute. > > > > > Just to confirm, for a UTF16 encoded file, the newlines attribute is > ``None``. > Hmmm... having read t

Re: Detecting line endings

2006-02-08 Thread Fuzzyman
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: >... > > > Open the file with 'rU' mode, and check the file object's newline > > > attribute. > > Just to confirm, for a UTF16 encoded file, the newlines attribute is ``None``. All the best, Fuzzyman http://www.voidspace.org.uk/pyth

Re: Detecting line endings

2006-02-08 Thread Fuzzyman
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: >... > > I can't open with a codec unless an encoding is explicitly supplied. I > > still want to detect UTF16 even if the encoding isn't specified. > > > > As I said, I ought to test this... Without metadata I wonder how Python > > d

Re: Detecting line endings

2006-02-07 Thread Alex Martelli
Fuzzyman <[EMAIL PROTECTED]> wrote: ... > I can't open with a codec unless an encoding is explicitly supplied. I > still want to detect UTF16 even if the encoding isn't specified. > > As I said, I ought to test this... Without metadata I wonder how Python > determines it ? It doesn't. Python

Re: Detecting line endings

2006-02-07 Thread ajsiegel
Arthur wrote: > Arthur wrote: > Is my premise that tokenizer needs universal newline support to be > reliable correct? > > What else could put it out of sync with the complier? Anybody out there? Is my question, and the real world issue that provked it, unclear. Is the answer too obvious? Hav

Re: Detecting line endings

2006-02-07 Thread Fuzzyman
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: >... > > > Open the file with 'rU' mode, and check the file object's newline > > > attribute. > > > > Do you know if this works for multi-byte encodings ? Do files have > > You mean when you open them with the codecs module? > No, if

Re: Detecting line endings

2006-02-07 Thread Fuzzyman
Bengt Richter wrote: > On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <[EMAIL PROTECTED]> wrote: > > >Hello all, > > > >I'm trying to detect line endings used in text files. I *might* be > >decoding the files into unicode first (which may be encoded using > >multi-byte encodings) - which is why I'm not

Re: Detecting line endings

2006-02-07 Thread Alex Martelli
Fuzzyman <[EMAIL PROTECTED]> wrote: ... > > Open the file with 'rU' mode, and check the file object's newline > > attribute. > > Do you know if this works for multi-byte encodings ? Do files have You mean when you open them with the codecs module? > metadata associated with them showing the l

Re: Detecting line endings

2006-02-07 Thread Bengt Richter
On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <[EMAIL PROTECTED]> wrote: >Hello all, > >I'm trying to detect line endings used in text files. I *might* be >decoding the files into unicode first (which may be encoded using >multi-byte encodings) - which is why I'm not letting Python handle the >line end

Re: Detecting line endings

2006-02-07 Thread Arthur
Arthur wrote: > Alex Martelli wrote: > > I just got flummoxed by this issue, working with a (pre-alpha) package > by very experienced Python programmers who sent file.readline to > tokenizer.py without universal newline support. Went on a long (and > educational) journey trying to figure out w

Re: Detecting line endings

2006-02-07 Thread Arthur
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: > > >>Hello all, >> >>I'm trying to detect line endings used in text files. I *might* be >>decoding the files into unicode first (which may be encoded using > > > Open the file with 'rU' mode, and check the file object's newline > attri

Re: Detecting line endings

2006-02-07 Thread Fuzzyman
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: > > > Hello all, > > > > I'm trying to detect line endings used in text files. I *might* be > > decoding the files into unicode first (which may be encoded using > > Open the file with 'rU' mode, and check the file object's newline > attri

Re: Detecting line endings

2006-02-07 Thread Fuzzyman
Alex Martelli wrote: > Fuzzyman <[EMAIL PROTECTED]> wrote: > > > Hello all, > > > > I'm trying to detect line endings used in text files. I *might* be > > decoding the files into unicode first (which may be encoded using > > Open the file with 'rU' mode, and check the file object's newline > attri

Re: Detecting line endings

2006-02-07 Thread Sybren Stuvel
Fuzzyman enlightened us with: > This is what I came up with. [...] Comments/corrections welcomed. You could use a little more comments in the code, but apart from that it looks nice. Sybren -- The problem with the world is stupidity. Not saying there should be a capital punishment for stupidity,

Re: Detecting line endings

2006-02-06 Thread Alex Martelli
Fuzzyman <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm trying to detect line endings used in text files. I *might* be > decoding the files into unicode first (which may be encoded using Open the file with 'rU' mode, and check the file object's newline attribute. > My worry is that if '\n' *do

Re: Detecting line endings

2006-02-06 Thread Fuzzyman
Sybren Stuvel wrote: > Fuzzyman enlightened us with: > > My worry is that if '\n' *doesn't* signify a line break on the Mac, > > then it may exist in the body of the text - and trigger ``ending = > > '\n'`` prematurely ? > > I'd count the number of occurences of '\r\n', '\n' without a preceding >

Re: Detecting line endings

2006-02-06 Thread Fuzzyman
Sybren Stuvel wrote: > Fuzzyman enlightened us with: > > My worry is that if '\n' *doesn't* signify a line break on the Mac, > > then it may exist in the body of the text - and trigger ``ending = > > '\n'`` prematurely ? > > I'd count the number of occurences of '\r\n', '\n' without a preceding >

Re: Detecting line endings

2006-02-06 Thread Sybren Stuvel
Fuzzyman enlightened us with: > My worry is that if '\n' *doesn't* signify a line break on the Mac, > then it may exist in the body of the text - and trigger ``ending = > '\n'`` prematurely ? I'd count the number of occurences of '\r\n', '\n' without a preceding '\r' and '\r' without following '\n

Detecting line endings

2006-02-06 Thread Fuzzyman
Hello all, I'm trying to detect line endings used in text files. I *might* be decoding the files into unicode first (which may be encoded using multi-byte encodings) - which is why I'm not letting Python handle the line endings. Is the following safe and sane : text = open('test.txt', 'rb').read