I assume you know the tradeoff here: If you do depend upon mapper slot # in your implementation to speed it up, you are losing on code portability in long term....
That said, one way to achieve this is to use the JobConf API: int partition = jobConf.getInt(JobContext.TASK_PARTITION, -1); The framework assigns unique partition # to each mapper; this allows them to write to a distinct output file. Note that this is a global partition #, not local to each node. Also, in case you have mappers and reducers using the same cache, then add jobConf.getBoolean(JobContext.TASK_ISMAP)... check to indicate whether you are executing in mapper or reducer context. -Rahul On Mon, Oct 14, 2013 at 2:49 PM, Hider, Sandy <sandy.hi...@jhuapl.edu>wrote: > ** ** > > In Hadoop under the mapred-site.conf I can set the maximum number of > mappers. For the sake of this email I will call the number of concurrent > mappers: mapper slots. **** > > ** ** > > Is it possible to figure out from within the mapper which mapper slot it > is running in? **** > > ** ** > > On this project this is important because each mapper has to fork off a > Matlab runtime compiled executable. The executable is passed in at runtime > a cache to work in. Setting up the cache when given an new directory takes > a long time but can be used again quickly on future calls if provided the > same location of the cache. As it turns out when multiple mappers try to > use the same cache they crash the executable. So ideally if I could > identify which mapper slot a mapper is running in, I can setup caches for > each slot and avoid the cache creation time and still guarantee that no two > mappers write to the same cache. **** > > ** ** > > Thanks for taking the time to read this,**** > > ** ** > > Sandy**** > > ** ** > > ** ** >