04, 2008 2:29 PM
To: core-user@hadoop.apache.org
Subject: RE: Strange behavior with bzip2 input files w/release 0.19.0
Thanks Abdul. Very exciting that hadoop will soon be able to handle
not only pbzip2 files but also be able to split bzip2 files.
I will apply the patch and report back
I'm seeing some strange behavior with bzip2 files and release
0.19.0. I'm wondering if anyone can shed some light on what I'm seeing.
Basically it _looks_ like the processing of a particular bzip2 input
file is stopping after the first bzip2 block. Below is a comparison of
tests between
Currently in Hadoop you cannot split bzip2 files:
http://issues.apache.org/jira/browse/HADOOP-4012
However, gzip files can be split:
http://issues.apache.org/jira/browse/HADOOP-437
Hope this helps.
Alex
On Thu, Dec 4, 2008 at 9:11 AM, Andy Sautins [EMAIL PROTECTED]wrote:
I'm seeing
Andy,
As you said, you suspect that only one bzip2 block is being decompressed
and used; is you bzip2 file the concatenation of multiple bzip2 files (i.e.
are
you doing something like cat a.bz2 b.bz2 c.bz2 yourFile.bz2 ?) In such
a case, there will be many bzip2 end of stream markers in a
( at least it
bunzip2 decompresses correctly ).
Thank you
Andy
-Original Message-
From: Abdul Qadeer [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 12:07 PM
To: core-user@hadoop.apache.org
Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0
Andy,
As you
it
bunzip2 decompresses correctly ).
Thank you
Andy
-Original Message-
From: Abdul Qadeer [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 12:07 PM
To: core-user@hadoop.apache.org
Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0
Andy
, December 04, 2008 1:49 PM
To: core-user@hadoop.apache.org
Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0
Andy,
As was mentioned earlier that splitting support is being added for bzip2
files
and actually patch is under review now. I think, pbzip2 generated files
should
work fine