Dave, I think you are probably right about using decompressor. I couldn't find any example of it in use and wasn't having any luck getting it to work based on the documentation. Maybe I should try harder on this front.
Colin Talbert GIS Specialist US Geological Survey - Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425 talbe...@usgs.gov From: Dave Angel <da...@ieee.org> To: Colin Talbert <talbe...@usgs.gov> Cc: Steven D'Aprano <st...@pearwood.info>, tutor@python.org Date: 06/03/2010 12:36 PM Subject: Re: [Tutor] parse text file Colin Talbert wrote: > <snip> > You are so correct. I'd been trying numerous things to read in this file > and had deleted the code that I meant to put here and so wrote this from > memory incorrectly. The code that I wrote should have been: > > import bz2 > input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') > str=input_file.read() > len(str) > > Which indeed does return only 900000. > > Which is also the number returned when you sum the length of all the lines > returned in a for line in file with: > > > import bz2 > input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') > lengthz = 0 > for uline in input_file: > lengthz = lengthz + len(uline) > > print lengthz > > <snip> > > Seems to me for such a large file you'd have to use bz2.BZ2Decompressor. I have no experience with it, but its purpose is for sequential decompression -- decompression where not all the data is simultaneously available in memory. DaveA
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor