Re: [Tutor] File transfer

Steven D'Aprano Sun, 31 Oct 2010 15:45:35 -0700

Chris King quoted Corey Richardson:

 On 10/31/2010 12:03 PM, Corey Richardson wrote:

[...]

To read from a file, you open it, and then read() it into a stringlike this:
for line in file:
    string += string + file.readline()


Aiieeee! Worst way to read from a file *EVAR*!!!

Seriously. Don't do this. This is just *wrong*.

(1) You're mixing file iteration (which already reads line by line) withreadline(), which would end in every second line going missing.Fortunately Python doesn't let you do this:


>>> for line in file:
...     print file.readline()
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: Mixing iteration and read methods would lose data

(2) Even if Python let you do it, it would be slow. Never write any loopwith repeated string concatenation:


result = ''
for string in list_of_strings:
    result += string  # No! Never do this! BAD BAD BAD!!!

This can, depending on the version of Python, the operating system, andvarious other factors, end up being thousands of times slower than thealternative:


result = ''.join(list_of_strings)

I'm not exaggerating. There was a bug reported in the Python HTTPlibrary about six(?) months ago where Python was taking half an hour toread a file that Internet Explorer or wget could read in under a second.

You might be lucky and never notice the poor performance, but one ofyour users will. This is a Shlemiel the Painter algorithm:


http://www.joelonsoftware.com/articles/fog0000000319.html

Under some circumstances, *some* versions of Python can correct for thepoor performance and optimize it to run quickly, but not all versions,and even the ones that do sometimes run into operating system dependentproblems that lead to terrible performance. Don't write Shlemiel thePainter code.


The right way to read chunks of data from a file is with the read method:

fp = open("filename", "rb")  # open in binary mode
data = fp.read()  # read the whole file
fp.close()

If the file is large, and you want to read it in small chunks, read()takes a number of optional arguments including how many bytes to read:


fp.read(64)  # read 64 bytes

If you want to read text files in lines, you can use the readline()method, which reads up to and including the next end of line; orreadlines() which returns a list of each line; or just iterate over thefile to get


Chris King went on to ask:

I don't think readline will work an image. How do you get raw binaryfrom a zip? Also make sure you do reply to the tutor list too, not just me.

readline() works fine on binary files, including images, but it won't beuseful because binary files aren't split into lines.

readline() reads until end-of-line, which varies according to theoperating system you are running, but often is \n. A binary file may ormay not contain any end-of-line characters. If it does, then readline()will read up to the next EOL perfectly fine:


f.readline()
=> '\x23\x01\0=#%\xff\n'

and if it doesn't, readline() will happily read the entire file all theway to the end:


'\x23\x01\0=#%\xff3m.\x02\0\xa0\0\0\0+)\0\x03c!<ft\0\xc2|\x8e~\0...'


To read a zip file as raw data, just open it as a regular binary file:

f = open("data.zip", "rb")

But this is the wrong way to solve the problem of transferring filesfrom one computer to another. The right way is to use a transportprotocol that already works, something like FTP or HTTP. The only reasonfor dealing with files as bytes is if you want to create your own filetransport protocol.




--
Steven

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] File transfer

Reply via email to