[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-28 Thread James Dominy
James Dominy added the comment: Ah, I did some digging. It turns out pbzip2 is installed on the system in question, and more annoyingly, /usr/bin/bzip2 is a symlink to pbzip2. I didn't realise the file was compressed by pbzip2. Thanks for the help. --

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-27 Thread James Dominy
James Dominy added the comment: How does one create a multi-stream bzip2 file in the first place? And how do I tell it's multi-stream. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-27 Thread Nadeem Vawda
Nadeem Vawda added the comment: How does one create a multi-stream bzip2 file in the first place? If you didn't do so deliberately, I would guess that you used a parallel compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work by splitting the input into chunks,

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread James Dominy
Changes by James Dominy jgdom...@gmail.com: -- title: BZ2File does decompress some .bz2 files correctly - BZ2File doesn't decompress some .bz2 files correctly ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread James Dominy
James Dominy added the comment: Whoops, forget to add the output from the standard binutils $ bzcat example-file.csv.bz2 | wc -c 909602 $ bzcat example-file.csv.bz2 | md5sum 48f4b69b2b8bb0b171ebc36313eb6616 - As you can see file sizes and hashes do not match --

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +nadeem.vawda, serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781 ___

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: All works on 3.4, but on 3.3 and 2.7 it looks hanged. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781 ___

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Oh, no, I just not pressed Enter after copying long testing command line. ;) All works on 3.3 too, but on 2.7 I got incomplete result. $ ./python -c 'import bz2, hashlib; d = bz2.BZ2File(../example-file.csv.bz2).read(); print len(d),

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Actually this file is composed of two bzip2 streams. Python 2.7 doesn't support decompressing of multi-stream inputs, this feature was added in 3.3. So this is not a bug. -- resolution: - invalid stage: - committed/rejected status: open - closed

[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Nadeem Vawda
Nadeem Vawda added the comment: As Serhiy said, multi-stream support was only added to the bz2 module in 3.3, and there is no plan to backport functionality this to 2.7. However, the bz2file package on PyPI [1] does support multi-stream inputs, and you can use its BZ2File class as a drop-in