How to get jobconf variables in streaming's mapper/reducer?

2009-05-15 Thread Steve Gao
I am using streaming with perl, and I want to get jobconf variable values. As many tutorials say they are in environment, but I can not get them. For example, in reducer: while (){   my $part = $ENV{"mapred.task.partition"};   print ("$part\n"); } It turns out that $ENV{"mapred.task.partition"

Re: [Interesting] One reducer randomly hangs on getting 0 mapper output

2009-04-10 Thread Steve Gao
Does anybody have a clue? Thanks lot. --- On Thu, 4/9/09, Steve Gao wrote: From: Steve Gao Subject: [Interesting] One reducer randomly hangs on getting 0 mapper output To: core-user@hadoop.apache.org Date: Thursday, April 9, 2009, 6:04 PM I have hadoop jobs with the last 1 reducer randomly

Re: [Interesting] One reducer randomly hangs on getting 0 mapper output

2009-04-09 Thread Steve Gao
I am using 0.17.0 . I think the problem is basically because reducer falls in a infinite loop to get mapper output, when mapper is somehow not available/dead . Doesn't hadoop have a solution? --- On Thu, 4/9/09, Steve Gao wrote: From: Steve Gao Subject: [Interesting] One reducer ran

[Interesting] One reducer randomly hangs on getting 0 mapper output

2009-04-09 Thread Steve Gao
I have hadoop jobs with the last 1 reducer randomly hangs on getting 0 mapper output. By randomly I mean the job sometimes works correctly, sometimes their last 1 reducer keeps reading map output but always gets 0 data. It would hang up to 100 hours for getting 0 data until I kill it. After I k

Re: Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Steve Gao
ends right now is that the patch that was committed broke a lot of other things, so it's been disabled. As such, there is no working append in HDFS, and certainly not in hadoop-17.x. -Bryan On Mar 17, 2009, at 4:50 PM, Steve Gao wrote: > Thanks, but I was told there is an append command,

Re: How to apply a patch to my hadoop?

2009-03-17 Thread Steve Gao
..@yahoo.com" Date: Tuesday, March 17, 2009, 7:52 PM Hello Steve. Assuming you are using *nix. To Apply patch patch -p0 -E < HADOOP-X.patch To remove Patch patch -p0 --reverse -E < HADOOP-X.patch Hope this helps. Regards, Ravi On 3/17/09 4:48 PM, "Steve Gao"

Re: Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Steve Gao
core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:42 PM what about an identity mapper taking A and B as inputs? this will likely mix rows of A and B together though... On Tue, Mar 17, 2009 at 7:35 PM, Steve Gao wrote: > BTW, I am using hadoop 0.17.0 and jdk 1.6 > > --- On Tue, 3/17/0

How to apply a patch to my hadoop?

2009-03-17 Thread Steve Gao
I want to apply this patch https://issues.apache.org/jira/browse/HADOOP-1700 to my hadoop 0.17.0 . Would anybody tell me how to do it? Thanks!

Re: Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Steve Gao
BTW, I am using hadoop 0.17.0 and jdk 1.6 --- On Tue, 3/17/09, Steve Gao wrote: From: Steve Gao Subject: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:22 PM I need to append file A to file B in HDFS without downloading

Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Steve Gao
I need to append file A to file B in HDFS without downloading/uploading them to local disk. Is there a way?

RE: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
08, 12:11 AM Personally haven't worked with streaming but I guess the ur jobconfs map.input.file param should do it for you. -Original Message----- From: Steve Gao [mailto:[EMAIL PROTECTED] Sent: Thursday, October 23, 2008 7:26 AM To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Su

[Help needed] Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
Sorry for the email. Thanks for any help or hint.     I am using Hadoop Streaming. The input are multiple files.     Is there a way to get the current filename in mapper?     For example:     $HADOOP_HOME/bin/hadoop  \     jar $HADOOP_HOME/hadoop-streaming.jar \     -input file1 \     -in

Is there a way to know the input filename at Hadoop Streaming?

2008-10-22 Thread Steve Gao
I am using Hadoop Streaming. The input are multiple files. Is there a way to get the current filename in mapper? For example: $HADOOP_HOME/bin/hadoop \ jar $HADOOP_HOME/hadoop-streaming.jar \ -input file1 \ -input file2 \ -output myOutputDir \ -mapper mapper \ -reducer reducer

Help: How to change number of mappers in Hadoop streaming?

2008-10-16 Thread Steve Gao
Would anybody help me? Can I use -jobconf mapred.map.task=50 in streaming command to change the job's number of mappers? I don't have a hadoop at hand and can not verify it. Thanks for your help. --- On Wed, 10/15/08, Steve Gao <[EMAIL PROTECTED]> wrote: From: Steve Gao &

How to change number of mappers in Hadoop streaming?

2008-10-15 Thread Steve Gao
Is there a way to change number of mappers in Hadoop streaming command line? I know I can change hadoop-default.xml:   mapred.map.tasks   10   The default number of map tasks per job.  Typically set   to a prime several times greater than number of available hosts.   Ignored when mapred.job.track

Re: Hadoop User Group (Bay Area) Oct 15th

2008-10-15 Thread Steve Gao
I am excited to see the slides. Would you send me a copy? Thanks. --- On Wed, 10/15/08, Nishant Khurana <[EMAIL PROTECTED]> wrote: From: Nishant Khurana <[EMAIL PROTECTED]> Subject: Re: Hadoop User Group (Bay Area) Oct 15th To: core-user@hadoop.apache.org Date: Wednesday, October 15, 2008, 9:45 AM

Are There Books of Hadoop/Pig?

2008-10-14 Thread Steve Gao
Does anybody know if there are books about hadoop or pig? The wiki and manual are kind of ad-hoc and hard to comprehend, for example "I want to know how to apply patchs to my Hadoop, but can't find how to do it" that kind of things. Would anybody help? Thanks.

Re: How to concatenate hadoop files to a single hadoop file

2008-10-02 Thread Steve Gao
Anybody knows? Thanks a lot. --- On Thu, 10/2/08, Steve Gao <[EMAIL PROTECTED]> wrote: From: Steve Gao <[EMAIL PROTECTED]> Subject: How to concatenate hadoop files to a single hadoop file To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Date: Thursday, October 2, 2008, 3:17 P

How to concatenate hadoop files to a single hadoop file

2008-10-02 Thread Steve Gao
Suppose I have 3 files in Hadoop that I want to "cat" them to a single file. I know it can be done by "hadoop dfs -cat" to a local file and updating it to Hadoop. But it's very expensive for large files. Is there an internal way to do this in Hadoop itself? Thanks

Is there a way to pause a running hadoop job?

2008-10-01 Thread Steve Gao
I have 5 running jobs, each has 2 reducers. Because I set max number of reducers as 10 so any incoming job will be hold until some of the 5 jobs finish and release reducer quota. Now the problem is that an incoming job has a higher priority that I want to pause some of the 5 jobs, let the new

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Steve Gao
Unfortunately this does not work. Hadoop complains: 08/08/21 18:04:46 ERROR streaming.StreamJob: Unexpected arg1 while processing -input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|-mapdebug|-reducedebug

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Steve Gao
That's interesting. Suppose your mapper script is a Perl script, how do you assign "my.mapper.arg1"'s value to a variable $x? $x = $my.mapper.arg1 I just tried the way and my perl script does not recognize $my.mapper.arg1. --- On Thu, 8/21/08, Rong-en Fan <[EMAIL PROTECTED]> wrote: From: Rong-en

Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
CTED]> Subject: Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ? To: core-user@hadoop.apache.org, "Steve Gao" <[EMAIL PROTECTED]> Date: Friday, July 18, 2008, 8:27 PM On Jul 18, 2008, at 4:53 PM, Steve Gao wrote: > Hi All, > I am u

[Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
Hi All,     I am using Hadoop Streaming. I am confused by streaming options: -file and -CacheFile. Seems that they mean the same thing, right?     Another misleading options are : -NumReduceTasks and -jobconf mapred.reduce.tasks. Both are used to control (or give hit to) the number of reducer

What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
Seems that they mean the same thing, right? Another misleading options are : -NumReduceTasks and -jobconf mapred.reduce.tasks. Both are used to control (or give hit to) the number of reducers.