Re: Input split for a streaming job!

2011-11-14 Thread Milind.Bhandarkar
It looks like your hadoop distro does not have https://issues.apache.org/jira/browse/HADOOP-4012. - milind On 11/10/11 2:40 PM, Raj V rajv...@yahoo.com wrote: All I assumed that the input splits for a streaming job will follow the same logic as a map reduce java job but I seem to be wrong. I

Re: Input split for a streaming job!

2011-11-14 Thread Raj V
Subject: Re: Input split for a streaming job! It looks like your hadoop distro does not have https://issues.apache.org/jira/browse/HADOOP-4012. - milind On 11/10/11 2:40 PM, Raj V rajv...@yahoo.com wrote: All I assumed that the input splits for a streaming job will follow the same logic as a map

Re: Input split for a streaming job!

2011-11-11 Thread Anirudh Jhina
Raj, What InputFormat are you using? The compressed format is not splittable, so if you have 73 gzip files, there will be 73 corresponding mappers for each file respectively. Look at the TextInputFormat.isSplittable() description. Thanks, ~Anirudh On Thu, Nov 10, 2011 at 2:40 PM, Raj V

Re: Input split for a streaming job!

2011-11-11 Thread Raj V
From: Joey Echeverria j...@cloudera.com To: Raj V rajv...@yahoo.com Sent: Friday, November 11, 2011 2:56 AM Subject: Re: Input split for a streaming job! U1 should be able to split the bzip2 files. What input format are you using? -Joey On Thu, Nov 10, 2011 at 9:06 PM

RE: Input split for a streaming job!

2011-11-11 Thread Tim Broberg
To: Joey Echeverria Cc: common-user@hadoop.apache.org Subject: Re: Input split for a streaming job! Joey,Anirudh, Bejoy I am using TextInputFormat Class. (org.apache.hadoop.mapred.TextInputFormat). And the input files were created using 32MB block size and the files are bzip2. So all things point

Re: Input split for a streaming job!

2011-11-11 Thread Raj V
...@cloudera.com Sent: Friday, November 11, 2011 10:25 AM Subject: RE: Input split for a streaming job! What version of hadoop are you using?   We just stumbled on the Jira item for BZIP2 splitting, and it appears to have been added in 0.21.   When I diff 0.20.205 vs trunk, I see public class BZip2Codec

Re: Input split for a streaming job!

2011-11-11 Thread bejoy . hadoop
. Regards Bejoy K S -Original Message- From: Raj V rajv...@yahoo.com Date: Fri, 11 Nov 2011 10:34:18 To: Tim Brobergtim.brob...@exar.com; common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Re: Input split for a streaming job! Tim

RE: Input split for a streaming job!

2011-11-11 Thread Tim Broberg
-compression/ - Tim. From: bejoy.had...@gmail.com [bejoy.had...@gmail.com] Sent: Friday, November 11, 2011 10:44 AM To: common-user@hadoop.apache.org; Raj V; Tim Broberg Subject: Re: Input split for a streaming job! Hi Raj AFAIK 0.21is an unstable

Input split for a streaming job!

2011-11-10 Thread Raj V
All I assumed that the input splits for a streaming job will follow the same logic as a map reduce java job but I seem to be wrong.  I started out with 73 gzipped files that vary between 23MB to 255MB in size. My default block size was 128MB.  8 of the 73 files are larger than 128 MB When I