John Machin wrote:
Here's the scoop: It's a bug in the newline handling (in io.py, class
IncrementalNewlineDecoder, method decode). It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the decoded chunk n in case
On Dec 7, 8:15 pm, Terry Reedy [EMAIL PROTECTED] wrote:
John Machin wrote:
Here's the scoop: It's a bug in the newline handling (in io.py, class
IncrementalNewlineDecoder, method decode). It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when
John Machin schrieb:
He did. Ugly stuff using readline() :-) Should still work, though.
Well, well, I'm a C kinda guy used to while (fgets(b, sizeof(b), f))
kinda loops :-)
But, seriously - I find that whole while True: and if line ==
construct ugly as hell, too. How can reading a file line
On Sun, 07 Dec 2008 16:05:53 +0100
Johannes Bauer [EMAIL PROTECTED] wrote:
But, seriously - I find that whole while True: and if line ==
construct ugly as hell, too. How can reading a file line by line be
achieved in a more pythonic kind of way?
for line in open(filename):
do stuff with
On Dec 8, 2:05 am, Johannes Bauer [EMAIL PROTECTED] wrote:
John Machin schrieb:
He did. Ugly stuff using readline() :-) Should still work, though.
Well, well, I'm a C kinda guy used to while (fgets(b, sizeof(b), f))
kinda loops :-)
But, seriously - I find that whole while True: and if
[EMAIL PROTECTED] schrieb:
2 problems: endianness and trailing zer byte.
This works for me:
This is very strange - when using utf16, endianness should be detected
automatically. When I simply truncate the trailing zero byte, I receive:
Traceback (most recent call last):
File ./modify.py,
John Machin schrieb:
On Dec 6, 5:36 am, Johannes Bauer [EMAIL PROTECTED] wrote:
So UTF-16 has an explicit EOF marker within the text? I cannot find one
in original file, only some kind of starting sequence I suppose
(0xfeff). The last characters of the file are 0x00 0x0d 0x00 0x0a,
simple
Johannes Bauer wrote:
[EMAIL PROTECTED] schrieb:
2 problems: endianness and trailing zer byte.
This works for me:
This is very strange - when using utf16, endianness should be detected
automatically. When I simply truncate the trailing zero byte, I receive:
Traceback (most recent call
Johannes Bauer [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
John Machin schrieb:
On Dec 6, 5:36 am, Johannes Bauer [EMAIL PROTECTED] wrote:
So UTF-16 has an explicit EOF marker within the text? I cannot find one
in original file, only some kind of starting sequence I suppose
On Dec 7, 6:20 am, Mark Tolonen [EMAIL PROTECTED] wrote:
Johannes Bauer [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
John Machin schrieb:
On Dec 6, 5:36 am, Johannes Bauer [EMAIL PROTECTED] wrote:
So UTF-16 has an explicit EOF marker within the text? I cannot find one
in
Johannes Bauer [EMAIL PROTECTED] writes:
This is very strange - when using utf16, endianness should be detected
automatically. When I simply truncate the trailing zero byte, I receive:
Any chance that whatever you used to simply truncate the trailing
zero byte also removed the BOM at the start
On Dec 7, 9:01 am, David Bolen [EMAIL PROTECTED] wrote:
Johannes Bauer [EMAIL PROTECTED] writes:
This is very strange - when using utf16, endianness should be detected
automatically. When I simply truncate the trailing zero byte, I receive:
Any chance that whatever you used to simply
On Dec 7, 9:34 am, John Machin [EMAIL PROTECTED] wrote:
On Dec 7, 9:01 am, David Bolen [EMAIL PROTECTED] wrote:
Johannes Bauer [EMAIL PROTECTED] writes:
This is very strange - when using utf16, endianness should be detected
automatically. When I simply truncate the trailing zero byte, I
Hello group,
I'm having trouble reading a utf-16 encoded file with Python3.0. This is
my (complete) code:
#!/usr/bin/python3.0
class AddressBook():
def __init__(self, filename):
f = open(filename, r, encoding=utf16)
while True:
Johannes Bauer [EMAIL PROTECTED] writes:
Traceback (most recent call last):
File ./modify.py, line 12, in module
a = AddressBook(2008_11_05_Handy_Backup.txt)
File ./modify.py, line 7, in __init__
line = f.readline()
File /usr/local/lib/python3.0/io.py, line 1807, in readline
J Kenneth King schrieb:
It probably means what it says: that the input file contains characters
it cannot read using the specified encoding.
No, it doesn't. The file is just fine, just as the example.
Are you generating the file from python using a file object with the
same encoding? If
J Kenneth King [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
It probably means what it says: that the input file contains characters
it cannot read using the specified encoding.
That was my first thought. However it appears that there is an off by one
error somewhere in the
Johannes Bauer wrote:
Hello group,
I'm having trouble reading a utf-16 encoded file with Python3.0. This is
my (complete) code:
what OS. This is often critical when you have a problem interacting
with the OS.
#!/usr/bin/python3.0
class AddressBook():
def __init__(self,
Terry Reedy schrieb:
Johannes Bauer wrote:
Hello group,
I'm having trouble reading a utf-16 encoded file with Python3.0. This is
my (complete) code:
what OS. This is often critical when you have a problem interacting
with the OS.
It's a 64-bit Linux, currently running:
Linux joeserver
On Dec 5, 2008, at 11:36 AM, Johannes Bauer wrote:
I suspect that '?' after \n (\u0a00) is indicates not 'question-mark'
but 'uninterpretable as a utf16 character'. The traceback below
confirms that. It should be an end-of-file marker and should not be
passed to Python. I strongly suspect
On Dec 5, 3:25 pm, Johannes Bauer [EMAIL PROTECTED] wrote:
Hello group,
I'm having trouble reading a utf-16 encoded file with Python3.0. This is
my (complete) code:
#!/usr/bin/python3.0
class AddressBook():
def __init__(self, filename):
f = open(filename, r,
Joe Strout wrote:
On Dec 5, 2008, at 11:36 AM, Johannes Bauer wrote:
I suspect that '?' after \n (\u0a00) is indicates not 'question-mark'
but 'uninterpretable as a utf16 character'. The traceback below
confirms that. It should be an end-of-file marker and should not be
passed to Python. I
On Dec 6, 5:36 am, Johannes Bauer [EMAIL PROTECTED] wrote:
So UTF-16 has an explicit EOF marker within the text? I cannot find one
in original file, only some kind of starting sequence I suppose
(0xfeff). The last characters of the file are 0x00 0x0d 0x00 0x0a,
simple \r\n line ending.
Sorry,
On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
So UTF-16 has an explicit EOF marker within the text?
No, it does not. I don't know what Terry's thinking of there, but text
files do not have any EOF marker. They start at the beginning
(sometimes including a byte-order mark), and go
On Dec 6, 10:35 am, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:
On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
So UTF-16 has an explicit EOF marker within the text?
No, it does not. I don't know what Terry's thinking of there, but text
files do not have any EOF
John Machin wrote:
On Dec 6, 10:35 am, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:
On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
So UTF-16 has an explicit EOF marker within the text?
No, it does not. I don't know what Terry's thinking of there, but text
files do not
26 matches
Mail list logo