[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-20 Thread Michael Fox
Michael Fox added the comment: I thought about it some more and the only bug here is mine, failing to explicitly set mode='rt'. Maybe back when someone invented text and binary modes they should have been clear which was to be the default for all things. Maybe when someone made the

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-20 Thread Michael Fox
Michael Fox added the comment: I thought of an even more hazardous case: if compression == 'gz': import gzip open = gzip.open elif compression == 'xz': import lzma open = lzma.open else: pass On Mon, May 20, 2013 at 9:41 AM, Michael Fox wrote: >

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-20 Thread Michael Fox
Michael Fox added the comment: You're right. In fact, what doesn't make sense is to be doing line-oriented reads on a binary file. Why was I doing that? I do have another quibble though. The open() function is like this: open(file, mode='r', buffering=-1, encoding=None,

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-20 Thread Michael Fox
Michael Fox added the comment: I was thinking about this line: end = self._buffer.find(b"\n", self._buffer_offset) + 1 Might be a bug? For example, is there a unicode where one of several bytes is '\n'? In this case it splits the line in the middle of a character, right?

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-19 Thread Michael Fox
Michael Fox added the comment: io.BufferedReader works well for me. Thanks for the good suggestion. Now python 3.3 and 3.4 have similar performance to each other and they are only 2x slower than pyliblzma. >From my perspective default wrapping with io.BufferedReader is a great idea. I ca

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-18 Thread Michael Fox
Michael Fox added the comment: I looked into it a little and it looks like pyliblzma is a pure C extension whereas new lzma library wraps liblzma but the rest is python. In particular this happens for every line: if size < 0: end = self._buffer.find(b"\

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-18 Thread Michael Fox
Michael Fox added the comment: 3.4 is much better but still 4x slower than 2.7 m@air:~/q/topaz/parse_datalog$ time python2.7 lzmaperf.py 102368 real0m0.053s user0m0.052s sys 0m0.000s m@air:~/q/topaz/parse_datalog$ time ~/tmp/cpython-23836f17e4a2/bin/python3.4 lzmaperf.py 102368

[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-17 Thread Michael Fox
New submission from Michael Fox: import lzma count = 0 f = lzma.LZMAFile('bigfile.xz' ,'r') for line in f: count += 1 print(count) Comparing python2 with pyliblzma to python3.3.1 with the new lzma: m@air:~/q/topaz/parse_datalog$ time python lzmaperf.py 102368 r