Dne 27.4.2012 17:30, Robert Evans napsal(a):
Radim,
You would want to modify the application master for this, and it is
likely to be a bit of a hack because the RM scheduler itself is not
really designed for this.
What about to do something like this:
In job JAR there will be loadable plugin
if plugin system for AM is overkill, something simpler can be made like:
maximum number of mappers per node
maximum number of reducers per node
maximum percentage of non data local tasks
maximum percentage of rack local tasks
and set this in job properties.
Subbu,
The only way I can think of, is to use an overridden
InputFormat/RecordReader pair that sets the "map.input.file" config
value during its initialization, using the received FileSplit object.
This should be considered as a bug, however, and even 2.x is affected.
Can you please file a JIRA o
Hi Subbu,
I am not sure which input format you are using. If you are using
FileInputFormat, you can get the file name this way in map function..
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Mapper;
public c
Hi Subbu,
The file/split processed by a mapper could be obtained from
WebUI as soon as the job is executed. However this detail can't be
obtained once the job is moved to JT history.
Regards
Bejoy
On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam
wrote:
> Hi,
>
> Could anyone suggest how
I believe I have fixed it.
I am using pig-0.9.2. My cluster is using CDH4b2, but I am not using the
Pig RPM install on the client. I downloaded the tarball from Apache.
Each machine in my cluster has $HADOOP_MAPRED_HOME defined.
I cleaned up my Pig configuration directory to only have the follow
do i understand it correctly that with kerberos enabled the mappers and
reducers will be "run as" the actual user that started them? as opposed to
the user that runs the tasktracker, which is mapred or hadoop or something
like that?
Yes Koert! That is correct!
On Thu, May 3, 2012 at 6:08 PM, Koert Kuipers wrote:
> do i understand it correctly that with kerberos enabled the mappers and
> reducers will be "run as" the actual user that started them? as opposed to
> the user that runs the tasktracker, which is mapred or hadoop