James Dominy added the comment:
Ah, I did some digging. It turns out pbzip2 is installed on the system in
question, and more annoyingly, /usr/bin/bzip2 is a symlink to pbzip2. I didn't
realise the file was compressed by pbzip2.
Thanks for the help.
--
James Dominy added the comment:
How does one create a multi-stream bzip2 file in the first place? And how do I
tell it's multi-stream.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
Nadeem Vawda added the comment:
How does one create a multi-stream bzip2 file in the first place?
If you didn't do so deliberately, I would guess that you used a parallel
compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work
by splitting the input into chunks,
Changes by James Dominy jgdom...@gmail.com:
--
title: BZ2File does decompress some .bz2 files correctly - BZ2File doesn't
decompress some .bz2 files correctly
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
James Dominy added the comment:
Whoops, forget to add the output from the standard binutils
$ bzcat example-file.csv.bz2 | wc -c
909602
$ bzcat example-file.csv.bz2 | md5sum
48f4b69b2b8bb0b171ebc36313eb6616 -
As you can see file sizes and hashes do not match
--
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +nadeem.vawda, serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
___
Serhiy Storchaka added the comment:
All works on 3.4, but on 3.3 and 2.7 it looks hanged.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
___
Serhiy Storchaka added the comment:
Oh, no, I just not pressed Enter after copying long testing command line. ;)
All works on 3.3 too, but on 2.7 I got incomplete result.
$ ./python -c 'import bz2, hashlib; d =
bz2.BZ2File(../example-file.csv.bz2).read(); print len(d),
Serhiy Storchaka added the comment:
Actually this file is composed of two bzip2 streams. Python 2.7 doesn't support
decompressing of multi-stream inputs, this feature was added in 3.3. So this is
not a bug.
--
resolution: - invalid
stage: - committed/rejected
status: open - closed
Nadeem Vawda added the comment:
As Serhiy said, multi-stream support was only added to the bz2 module in 3.3,
and there is no plan to backport functionality this to 2.7.
However, the bz2file package on PyPI [1] does support multi-stream inputs,
and you can use its BZ2File class as a drop-in
10 matches
Mail list logo