Chris King quoted Corey Richardson:

 On 10/31/2010 12:03 PM, Corey Richardson wrote:
[...]
To read from a file, you open it, and then read() it into a string like this:
for line in file:
    string += string + file.readline()

Aiieeee! Worst way to read from a file *EVAR*!!!

Seriously. Don't do this. This is just *wrong*.

(1) You're mixing file iteration (which already reads line by line) with readline(), which would end in every second line going missing. Fortunately Python doesn't let you do this:

>>> for line in file:
...     print file.readline()
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: Mixing iteration and read methods would lose data


(2) Even if Python let you do it, it would be slow. Never write any loop with repeated string concatenation:

result = ''
for string in list_of_strings:
    result += string  # No! Never do this! BAD BAD BAD!!!

This can, depending on the version of Python, the operating system, and various other factors, end up being thousands of times slower than the alternative:

result = ''.join(list_of_strings)

I'm not exaggerating. There was a bug reported in the Python HTTP library about six(?) months ago where Python was taking half an hour to read a file that Internet Explorer or wget could read in under a second.

You might be lucky and never notice the poor performance, but one of your users will. This is a Shlemiel the Painter algorithm:

http://www.joelonsoftware.com/articles/fog0000000319.html

Under some circumstances, *some* versions of Python can correct for the poor performance and optimize it to run quickly, but not all versions, and even the ones that do sometimes run into operating system dependent problems that lead to terrible performance. Don't write Shlemiel the Painter code.

The right way to read chunks of data from a file is with the read method:

fp = open("filename", "rb")  # open in binary mode
data = fp.read()  # read the whole file
fp.close()

If the file is large, and you want to read it in small chunks, read() takes a number of optional arguments including how many bytes to read:

fp.read(64)  # read 64 bytes

If you want to read text files in lines, you can use the readline() method, which reads up to and including the next end of line; or readlines() which returns a list of each line; or just iterate over the file to get

Chris King went on to ask:

I don't think readline will work an image. How do you get raw binary from a zip? Also make sure you do reply to the tutor list too, not just me.

readline() works fine on binary files, including images, but it won't be useful because binary files aren't split into lines.

readline() reads until end-of-line, which varies according to the operating system you are running, but often is \n. A binary file may or may not contain any end-of-line characters. If it does, then readline() will read up to the next EOL perfectly fine:

f.readline()
=> '\x23\x01\0=#%\xff\n'

and if it doesn't, readline() will happily read the entire file all the way to the end:

'\x23\x01\0=#%\xff3m.\x02\0\xa0\0\0\0+)\0\x03c!<ft\0\xc2|\x8e~\0...'


To read a zip file as raw data, just open it as a regular binary file:

f = open("data.zip", "rb")

But this is the wrong way to solve the problem of transferring files from one computer to another. The right way is to use a transport protocol that already works, something like FTP or HTTP. The only reason for dealing with files as bytes is if you want to create your own file transport protocol.



--
Steven

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to