No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Hi I'm having a query here. Is it possible to have no mappers but reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can set numReduceTasks to zero but such a setting on mapper wont work. So how can it be achieved if possible? Thank You Regards Bejoy.K.S

Re: No Mapper but Reducer

2011-09-07 Thread Sonal Goyal
I dont think that is possible, can you explain in what scenario you want to have no mappers, only reducers? Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Wed, Sep 7, 2011 at

Perl Mapper with Java Reducer

2011-09-07 Thread Bejoy KS
Hi Is it possible to have my mapper in Perl and reducer in java. In my existing legacy system some larger process is being handled by Perl and the business logic of those are really complex. It is a herculean task to convert all the Perl to java. But the reducer business logic which is

RE: No Mapper but Reducer

2011-09-07 Thread Devaraj K
Hi Bejoy, It is possible to execute a job with no mappers and reducers alone. You can try this by giving the empty directory as input for the job. Devaraj K _ From: Bejoy KS [mailto:bejoy.had...@gmail.com] Sent: Wednesday, September 07, 2011 1:30 PM To:

Re: No Mapper but Reducer

2011-09-07 Thread Harsh J
Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-) /me puts his troll-mask on. ➜ ~HADOOP_HOME hadoop fs -mkdir abc ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Sahana, Yes. But, isn't that how it is normally? What makes you question this capability? On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat sana.b...@gmail.com wrote: Hi,          Is it possible to have multiple mappers  where each mapper is operating on a different input file and whose result

Re: Perl Mapper with Java Reducer

2011-09-07 Thread Amareshwari Sri Ramadasu
You can look at Hadoop streaming http://hadoop.apache.org/common/docs/r0.20.0/streaming.html Thanks Amareshwari On 9/7/11 1:38 PM, Bejoy KS bejoy.had...@gmail.com wrote: Hi Is it possible to have my mapper in Perl and reducer in java. In my existing legacy system some larger process is

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Sahana Bhat
Hi, I understand that given a file, the file is split across 'n' mapper instances, which is the normal case. The scenario i have is : 1. Two files which are not totally identical in terms of number of columns (but have data that is similar in a few columns) need to be processed and after

Re: No Mapper but Reducer

2011-09-07 Thread Sudharsan Sampath
This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Sudharsan Sampath
Hi, Its possible by setting the num of reduce tasks to be 1. Based on your example, it looks like u need to group ur records based on Date, counter1 and counter2. So that should go in the logic of building your key for your map o/p. Thanks Sudhan S On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Sahana, Yes this is possible as well. Please take a look at the MultipleInputs API @ http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html It will allow you to add a path each with its own mapper implementation, and you can then have a common reducer

Re: Multiple Mappers and One Reducer

2011-09-07 Thread praveenesh kumar
Harsh, Can you please tell how can we use MultipleInputs using Job Object on hadoop 0.20.2. As you can see, in MultipleInputs, its using JobConf object. I want to use Job object as mentioned in new hadoop 0.21 API. I remember you talked about pulling out things from new API and add it into out

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Praveenesh, The JIRA https://issues.apache.org/jira/browse/MAPREDUCE-369 introduced it and carries a patch that I think would apply without much trouble on your cluster's sources. You can mail me directly if you need help applying a patch. Alternatively, you can do something like downloading

Re: No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Thank You All. Even I have noticed this strange behavior some time back. Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to

How to pass a parameter across the cluster.

2011-09-07 Thread ilyal levin
Hi What is the right way to pass a parameter for all mapper and reducers to see? Thanks

Re: No Mapper but Reducer

2011-09-07 Thread Harsh J
Nope. A reducer's input is from the map outputs alone (fetched in by the shuffling code), which would not exist here. What are you looking to do? Why won't a map task suffice for doing that? On Wed, Sep 7, 2011 at 4:51 PM, Bejoy KS bejoy.had...@gmail.com wrote: Thank You All. Even I have

Re: How to pass a parameter across the cluster.

2011-09-07 Thread Yaron Gonen
There is no right way. I think the best thing to do is to ask in the forums. I thought maybe via the Configuration object, but this is by no way a formal solution. On Wed, Sep 7, 2011 at 2:39 PM, ilyal levin nipponil...@gmail.com wrote: Hi What is the right way to pass a parameter for all

Re: How to pass a parameter across the cluster.

2011-09-07 Thread CHANG Lei
Another method is to store it on a shared store which can be accessed from each node, such as zookeeper, hdfs, hbase, db etc 在 2011 9 7 20:11,Yaron Gonen yaron.go...@gmail.com写道: There is no right way. I think the best thing to do is to ask in the forums. I thought maybe via the Configuration

Reducer running out of heap before it gets to the reduce() method

2011-09-07 Thread Matrangola, Geoffrey
I think my job is running out of memory before it calls reduce() in the reducer. It's running with large blocks of binary data emitted from the Maper. Each record emitted from the mappers should be small enough to fit in memory. However, if it tried to somehow keep a bunch of records for one

Re: No Mapper but Reducer

2011-09-07 Thread Robert Hafner
You could just have a mapper which sent off the exact values it took in (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do here. On Sep 7, 2011, at 4:21 AM, Bejoy KS bejoy.had...@gmail.com wrote: Thank You All. Even I have noticed this strange behavior some time back.