eShopping wrote:

The file is around 800 Mb but I can't get hold of it until next week so suggest starting a new topic once I have a cut-down copy.
OK will wait with bated breath.

Well, did you read on? What reactions do you have?

I did (finally) read on and I am still a little confused, though less than before. I guess the word UNFORMATTED means that the file has no format
Depends on what you mean by format. When you use % formatting in Python it is the same thing as a FORMATTED WRITE in FORTRAN - a set of directives that direct the translation of data to human readable text.

Files per se are a sequence of bytes. As such they have no "format". When we examine a file we attempt to make sense of the bytes.

Some of the bytes may represent ASCII printable characters - other not.The body of this email is a sequence of ASCII printable characters that make sense to you when you read them.

The file written UNFORMATTED has some ASCII printable characters that you can read (e.g. DISTANCE), some that you can recognize as letters, numbers, etc but are not English words, and non-printable characters that show up as "garbage" symbols or not at all. Those that are not "readable" are the internal representation of numbers.

.... though it presumably has some structure? One major hurdle is that I am not really sure about the difference between a Python binary file and a FORTRAN UNFORMATTED file so any pointers would be gratefully received

There is no such thing as a "Python binary file". When you open a file with mode 'b' you are asking the file system to ignore line-ends. If you do not specify 'b' then the file system "translates" line-ends into \n when reading and translates \n back to line-ends. The reason for this is that different OS file systems use different codes for line-ends. By translating them to and from \n the Python program becomes OS independent.

Windows uses ctrl-M ctrl-J (carriage return - line feed; \x0d\x0a).
Linux/Unix uses ctrl-J (line feed; \x0a).
Mac uses ctrl-M (carriage return; \x0d).
Python uniformly translates these to \n (x0a)

When processing files written without line-ends (e.g. UNFORMATTED) there may be line-end characters or sequences that must NOT be treated as line-ends. Hence mode 'b'

Example:

>>> x=open('x','w') # write "normal" allowing \n to be translated to the OS line end.
>>> x.write("Hello\n")
>>> x=open('x','rb') # read binary, avoiding translation.
>>> x.read()
'Hello\r\n'

where \r = \x0d

--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to