eShopping wrote:
The file is around 800 Mb but I can't get hold of it until next week
so suggest starting a new topic once I have a cut-down copy.
OK will wait with bated breath.
Well, did you read on? What reactions do you have?
I did (finally) read on and I am still a little confused, though less
than before. I guess the word UNFORMATTED means that the file has no
format
Depends on what you mean by format. When you use % formatting in Python
it is the same thing as a FORMATTED WRITE in FORTRAN - a set of
directives that direct the translation of data to human readable text.
Files per se are a sequence of bytes. As such they have no "format".
When we examine a file we attempt to make sense of the bytes.
Some of the bytes may represent ASCII printable characters - other
not.The body of this email is a sequence of ASCII printable characters
that make sense to you when you read them.
The file written UNFORMATTED has some ASCII printable characters that
you can read (e.g. DISTANCE), some that you can recognize as letters,
numbers, etc but are not English words, and non-printable characters
that show up as "garbage" symbols or not at all. Those that are not
"readable" are the internal representation of numbers.
.... though it presumably has some structure? One major hurdle is
that I am not really sure about the difference between a Python binary
file and a FORTRAN UNFORMATTED file so any pointers would be
gratefully received
There is no such thing as a "Python binary file". When you open a file
with mode 'b' you are asking the file system to ignore line-ends. If you
do not specify 'b' then the file system "translates" line-ends into \n
when reading and translates \n back to line-ends. The reason for this is
that different OS file systems use different codes for line-ends. By
translating them to and from \n the Python program becomes OS independent.
Windows uses ctrl-M ctrl-J (carriage return - line feed; \x0d\x0a).
Linux/Unix uses ctrl-J (line feed; \x0a).
Mac uses ctrl-M (carriage return; \x0d).
Python uniformly translates these to \n (x0a)
When processing files written without line-ends (e.g. UNFORMATTED) there
may be line-end characters or sequences that must NOT be treated as
line-ends. Hence mode 'b'
Example:
>>> x=open('x','w') # write "normal" allowing \n to be translated to
the OS line end.
>>> x.write("Hello\n")
>>> x=open('x','rb') # read binary, avoiding translation.
>>> x.read()
'Hello\r\n'
where \r = \x0d
--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor