Re: Identification of mapper slots

Rahul Jain Mon, 14 Oct 2013 16:26:24 -0700

I assume you know the tradeoff here: If you do depend upon mapper slot # in
your implementation to speed it up, you are losing on code portability in
long term....


That said, one way to achieve this is to use the JobConf API:

int partition = jobConf.getInt(JobContext.TASK_PARTITION, -1);

The framework assigns unique partition # to each mapper; this allows  them
to write to a distinct output file. Note that this is a global partition #,
not local to each node.

Also, in case you have mappers and reducers using the same cache, then add


jobConf.getBoolean(JobContext.TASK_ISMAP)...  check to indicate whether you
are executing in mapper or reducer context.


-Rahul







On Mon, Oct 14, 2013 at 2:49 PM, Hider, Sandy <sandy.hi...@jhuapl.edu>wrote:

> ** **
>
> In Hadoop under the mapred-site.conf  I can set the maximum number of
> mappers. For the sake of this email I will call the number of concurrent
> mappers: mapper slots.  ****
>
> ** **
>
> Is it possible to figure out from within the mapper which mapper slot it
> is running in? ****
>
> ** **
>
> On this project this is important because each mapper has to fork off a
> Matlab runtime compiled executable.  The executable is passed in at runtime
> a cache to work in.  Setting up the cache when given an new directory takes
> a long time but can be used again quickly on future calls if provided the
> same location of the cache.   As it turns out when multiple mappers try to
> use the same cache they crash the executable.   So ideally if I could
> identify which mapper slot a mapper is running in, I can setup caches for
> each slot and avoid the cache creation time and still guarantee that no two
> mappers write to the same cache.  ****
>
> ** **
>
> Thanks for taking the time to read this,****
>
> ** **
>
> Sandy****
>
> ** **
>
> ** **
>

Re: Identification of mapper slots

Reply via email to