Re: Python vs. Java gzip performance

2006-03-22 Thread Martin v. Löwis
Felipe Almeida Lessa wrote: > def readlines(self, sizehint=None): > if sizehint is None: > return self.read().splitlines(True) > # ... > > Is it okay? Or is there any embedded problem I couldn't see? It's dangerous, if the file is really large - it might exhaust your mem

Re: Python vs. Java gzip performance

2006-03-22 Thread Felipe Almeida Lessa
Em Qua, 2006-03-22 às 00:47 +0100, "Martin v. Löwis" escreveu: > Caleb Hattingh wrote: > > What does ".readlines()" do differently that makes it so much slower > > than ".read().splitlines(True)"? To me, the "one obvious way to do it" > > is ".readlines()". [snip] > Anyway, decompressing the entir

Re: Python vs. Java gzip performance

2006-03-21 Thread Martin v. Löwis
Caleb Hattingh wrote: > What does ".readlines()" do differently that makes it so much slower > than ".read().splitlines(True)"? To me, the "one obvious way to do it" > is ".readlines()". readlines reads 100 bytes (at most) at a time. I'm not sure why it does that (probably in order to not read fu

Re: Python vs. Java gzip performance

2006-03-21 Thread Caleb Hattingh
Hi Peter Clearly I misunderstood what Martin was saying :)I was comparing operations on lines via the file generator against first loading the file's lines into memory, and then performing the concatenation. What does ".readlines()" do differently that makes it so much slower than ".read().sp

Re: Python vs. Java gzip performance

2006-03-17 Thread Serge Orlov
Bill wrote: > Is there something that can be improved in the Python version? Seems like GzipFile.readlines is not optimized, file.readline works better: C:\py>python -c "file('tmp.txt', 'w').writelines('%d This is a test\n' % n for n in range(1))" C:\py>python -m timeit "open('tmp.txt').read

Re: Python vs. Java gzip performance

2006-03-17 Thread Andrew MacIntyre
Bill wrote: > I've written a small program that, in part, reads in a file and parses > it. Sometimes, the file is gzipped. The code that I use to get the > file object is like so: > > if filename.endswith(".gz"): > file = GzipFile(filename) > else: > file = open(filename) > > Then I par

Re: Python vs. Java gzip performance

2006-03-17 Thread Peter Otten
Caleb Hattingh wrote: > I tried this: > > from timeit import * > > #Try readlines > print Timer('import > gzip;lines=gzip.GzipFile("gztest.txt.gz").readlines();[i+"1" for i in > lines]').timeit(200) # This is one line > > > # Try file object - uses buffering? > print Timer('import gzip;[i+"1"

Re: Python vs. Java gzip performance

2006-03-17 Thread Caleb Hattingh
I tried this: from timeit import * #Try readlines print Timer('import gzip;lines=gzip.GzipFile("gztest.txt.gz").readlines();[i+"1" for i in lines]').timeit(200) # This is one line # Try file object - uses buffering? print Timer('import gzip;[i+"1" for i in gzip.GzipFile("gztest.txt.gz")]').time

Re: Python vs. Java gzip performance

2006-03-17 Thread Martin v. Löwis
Bill wrote: > The Java version of this code is roughly 2x-3x faster than the Python > version. I can get around this problem by replacing the Python > GzipFile object with a os.popen call to gzcat, but then I sacrifice > portability. Is there something that can be improved in the Python > version

Python vs. Java gzip performance

2006-03-17 Thread Bill
I've written a small program that, in part, reads in a file and parses it. Sometimes, the file is gzipped. The code that I use to get the file object is like so: if filename.endswith(".gz"): file = GzipFile(filename) else: file = open(filename) Then I parse the contents of the file in t