I assume you know the tradeoff here: If you do depend upon mapper slot # in
your implementation to speed it up, you are losing on code portability in
long term....

That said, one way to achieve this is to use the JobConf API:

int partition = jobConf.getInt(JobContext.TASK_PARTITION, -1);

The framework assigns unique partition # to each mapper; this allows  them
to write to a distinct output file. Note that this is a global partition #,
not local to each node.

Also, in case you have mappers and reducers using the same cache, then add

jobConf.getBoolean(JobContext.TASK_ISMAP)...  check to indicate whether you
are executing in mapper or reducer context.


On Mon, Oct 14, 2013 at 2:49 PM, Hider, Sandy <sandy.hi...@jhuapl.edu>wrote:

> ** **
> In Hadoop under the mapred-site.conf  I can set the maximum number of
> mappers. For the sake of this email I will call the number of concurrent
> mappers: mapper slots.  ****
> ** **
> Is it possible to figure out from within the mapper which mapper slot it
> is running in? ****
> ** **
> On this project this is important because each mapper has to fork off a
> Matlab runtime compiled executable.  The executable is passed in at runtime
> a cache to work in.  Setting up the cache when given an new directory takes
> a long time but can be used again quickly on future calls if provided the
> same location of the cache.   As it turns out when multiple mappers try to
> use the same cache they crash the executable.   So ideally if I could
> identify which mapper slot a mapper is running in, I can setup caches for
> each slot and avoid the cache creation time and still guarantee that no two
> mappers write to the same cache.  ****
> ** **
> Thanks for taking the time to read this,****
> ** **
> Sandy****
> ** **
> ** **

Reply via email to