RE: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-06 Thread Andy Sautins
04, 2008 2:29 PM To: core-user@hadoop.apache.org Subject: RE: Strange behavior with bzip2 input files w/release 0.19.0 Thanks Abdul. Very exciting that hadoop will soon be able to handle not only pbzip2 files but also be able to split bzip2 files. I will apply the patch and report back

Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Andy Sautins
I'm seeing some strange behavior with bzip2 files and release 0.19.0. I'm wondering if anyone can shed some light on what I'm seeing. Basically it _looks_ like the processing of a particular bzip2 input file is stopping after the first bzip2 block. Below is a comparison of tests between

Re: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Alex Loddengaard
Currently in Hadoop you cannot split bzip2 files: http://issues.apache.org/jira/browse/HADOOP-4012 However, gzip files can be split: http://issues.apache.org/jira/browse/HADOOP-437 Hope this helps. Alex On Thu, Dec 4, 2008 at 9:11 AM, Andy Sautins [EMAIL PROTECTED]wrote: I'm seeing

Re: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Abdul Qadeer
Andy, As you said, you suspect that only one bzip2 block is being decompressed and used; is you bzip2 file the concatenation of multiple bzip2 files (i.e. are you doing something like cat a.bz2 b.bz2 c.bz2 yourFile.bz2 ?) In such a case, there will be many bzip2 end of stream markers in a

RE: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Andy Sautins
( at least it bunzip2 decompresses correctly ). Thank you Andy -Original Message- From: Abdul Qadeer [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 12:07 PM To: core-user@hadoop.apache.org Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0 Andy, As you

Re: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Abdul Qadeer
it bunzip2 decompresses correctly ). Thank you Andy -Original Message- From: Abdul Qadeer [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 12:07 PM To: core-user@hadoop.apache.org Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0 Andy

RE: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-04 Thread Andy Sautins
, December 04, 2008 1:49 PM To: core-user@hadoop.apache.org Subject: Re: Strange behavior with bzip2 input files w/release 0.19.0 Andy, As was mentioned earlier that splitting support is being added for bzip2 files and actually patch is under review now. I think, pbzip2 generated files should work fine