Re: max 1 mapper per node

2012-05-03 Thread Radim Kolar
Dne 27.4.2012 17:30, Robert Evans napsal(a): Radim, You would want to modify the application master for this, and it is likely to be a bit of a hack because the RM scheduler itself is not really designed for this. What about to do something like this: In job JAR there will be loadable

Re: Getting filename in case of MultipleInputs

2012-05-03 Thread Harsh J
Subbu, The only way I can think of, is to use an overridden InputFormat/RecordReader pair that sets the map.input.file config value during its initialization, using the received FileSplit object. This should be considered as a bug, however, and even 2.x is affected. Can you please file a JIRA on

RE: Getting filename in case of MultipleInputs

2012-05-03 Thread Devaraj k
Hi Subbu, I am not sure which input format you are using. If you are using FileInputFormat, you can get the file name this way in map function.. import org.apache.hadoop.mapred.FileSplit; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Mapper; public

Re: Getting filename in case of MultipleInputs

2012-05-03 Thread Bejoy Ks
Hi Subbu, The file/split processed by a mapper could be obtained from WebUI as soon as the job is executed. However this detail can't be obtained once the job is moved to JT history. Regards Bejoy On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi,

Re: MapReduce jobs remotely

2012-05-03 Thread Kevin
I believe I have fixed it. I am using pig-0.9.2. My cluster is using CDH4b2, but I am not using the Pig RPM install on the client. I downloaded the tarball from Apache. Each machine in my cluster has $HADOOP_MAPRED_HOME defined. I cleaned up my Pig configuration directory to only have the

kerberos security enabled and hadoop/hdfs/mapred users

2012-05-03 Thread Koert Kuipers
do i understand it correctly that with kerberos enabled the mappers and reducers will be run as the actual user that started them? as opposed to the user that runs the tasktracker, which is mapred or hadoop or something like that?