It looks like your hadoop distro does not have
https://issues.apache.org/jira/browse/HADOOP-4012.
- milind
On 11/10/11 2:40 PM, Raj V rajv...@yahoo.com wrote:
All
I assumed that the input splits for a streaming job will follow the same
logic as a map reduce java job but I seem to be wrong.
I
Subject: Re: Input split for a streaming job!
It looks like your hadoop distro does not have
https://issues.apache.org/jira/browse/HADOOP-4012.
- milind
On 11/10/11 2:40 PM, Raj V rajv...@yahoo.com wrote:
All
I assumed that the input splits for a streaming job will follow the same
logic as a map
Raj,
What InputFormat are you using? The compressed format is not splittable, so
if you have 73 gzip files, there will be 73 corresponding mappers for each
file respectively. Look at the TextInputFormat.isSplittable() description.
Thanks,
~Anirudh
On Thu, Nov 10, 2011 at 2:40 PM, Raj V
From: Joey Echeverria j...@cloudera.com
To: Raj V rajv...@yahoo.com
Sent: Friday, November 11, 2011 2:56 AM
Subject: Re: Input split for a streaming job!
U1 should be able to split the bzip2 files. What input format are you using?
-Joey
On Thu, Nov 10, 2011 at 9:06 PM
To: Joey Echeverria
Cc: common-user@hadoop.apache.org
Subject: Re: Input split for a streaming job!
Joey,Anirudh, Bejoy
I am using TextInputFormat Class. (org.apache.hadoop.mapred.TextInputFormat).
And the input files were created using 32MB block size and the files are bzip2.
So all things point
...@cloudera.com
Sent: Friday, November 11, 2011 10:25 AM
Subject: RE: Input split for a streaming job!
What version of hadoop are you using?
We just stumbled on the Jira item for BZIP2 splitting, and it appears to have
been added in 0.21.
When I diff 0.20.205 vs trunk, I see
public class BZip2Codec
.
Regards
Bejoy K S
-Original Message-
From: Raj V rajv...@yahoo.com
Date: Fri, 11 Nov 2011 10:34:18
To: Tim Brobergtim.brob...@exar.com;
common-user@hadoop.apache.orgcommon-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Re: Input split for a streaming job!
Tim
-compression/
- Tim.
From: bejoy.had...@gmail.com [bejoy.had...@gmail.com]
Sent: Friday, November 11, 2011 10:44 AM
To: common-user@hadoop.apache.org; Raj V; Tim Broberg
Subject: Re: Input split for a streaming job!
Hi Raj
AFAIK 0.21is an unstable
All
I assumed that the input splits for a streaming job will follow the same logic
as a map reduce java job but I seem to be wrong.
I started out with 73 gzipped files that vary between 23MB to 255MB in size. My
default block size was 128MB. 8 of the 73 files are larger than 128 MB
When I