"Colin Talbert" <talbe...@usgs.gov> wrote

I thought when you did a for uline in input_file each single line would go
into memory independently, not the entire file.

Thats true but your code snippet showed you using read()
which reads the whole file...

I'm pretty sure that this is not your code, because you can't call len()
on a bz2 file. If you try, you get an error:

You are so correct. I'd been trying numerous things to read in this file and had deleted the code that I meant to put here and so wrote this from
memory incorrectly.  The code that I wrote should have been:

import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
str=input_file.read()
len(str)

This again usees read() which reads the whole file.

Which is also the number returned when you sum the length of all the lines
returned in a for line in file with:

import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
lengthz = 0
for uline in input_file:
   lengthz = lengthz + len(uline)

I'm not sure how

for line in file

will work for binary files. It may read the whole thing since
the concept of lines really only applies to text. So it may
be the same result as using read()

Try looping using read(n) where n is some buffer size
(1024 might be a good value?).

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to