Hello Steven, Thanks for the reply. Also this is my first post to tu...@python so I'll reply all in the future.
However, a file of that size changes things drastically. You can't expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file into memory at once, let along the unpacked 131 GB text file, EVEN if your computer has more than 9.2 GB of memory. So your tests need to take this into account. I thought when you did a for uline in input_file each single line would go into memory independently, not the entire file. I'm pretty sure that this is not your code, because you can't call len() on a bz2 file. If you try, you get an error: You are so correct. I'd been trying numerous things to read in this file and had deleted the code that I meant to put here and so wrote this from memory incorrectly. The code that I wrote should have been: import bz2 input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') str=input_file.read() len(str) Which indeed does return only 900000. Which is also the number returned when you sum the length of all the lines returned in a for line in file with: import bz2 input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') lengthz = 0 for uline in input_file: lengthz = lengthz + len(uline) print lengthz Thanks again for you help and sorry for the bad code in the previous submittal. Colin Talbert GIS Specialist US Geological Survey - Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425 talbe...@usgs.gov From: Steven D'Aprano <st...@pearwood.info> To: tutor@python.org Date: 06/02/2010 03:42 PM Subject: Re: [Tutor] parse text file Sent by: tutor-bounces+talbertc=usgs....@python.org Hi Colin, I'm taking the liberty of replying to your message back to the list, as others hopefully may be able to make constructive comments. When replying, please ensure that you reply to the tutor mailing list rather than then individual. On Thu, 3 Jun 2010 12:20:10 am Colin Talbert wrote: > > Without seeing your text file, and the code you use to read the text > > file, there's no way of telling what is going on, but I can guess > > the most likely causes: > > Since the file is 9.2 gig it wouldn't make sense to send it to you. And I am very glad you didn't try *smiles* However, a file of that size changes things drastically. You can't expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file into memory at once, let along the unpacked 131 GB text file, EVEN if your computer has more than 9.2 GB of memory. So your tests need to take this into account. > > (2) There's a bug in your code so that you stop reading after > > 900,000 bytes. > The code is simple enough that I'm pretty sure there is not a > bug in it. > > import bz2 > input_file = > bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') print > len(input_file) > > returns 900000 I'm pretty sure that this is not your code, because you can't call len() on a bz2 file. If you try, you get an error: >>> x = bz2.BZ2File('test.bz2', 'w') # create a temporary file >>> x.write("some data") >>> x.close() >>> input_file = bz2.BZ2File('test.bz2', 'r') # open it >>> print len(input_file) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'bz2.BZ2File' has no len() So whatever your code actually is, I'm fairly sure it isn't what you say here. -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor