If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1.
On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley <kwi...@keithwiley.com> wrote: > I'm running a program which in the streaming layer automatically > multithreads and does so by automatically detecting the number of cores on > the machine. I realize this model is somewhat in conflict with Hadoop, but > nonetheless, that's what I'm doing. Thus, for even resource utilization, > it would be nice to not only assign one mapper per core, but only one > mapper per machine. I realize that if I saturate the cluster none of this > really matters, but consider the following example for clarity: 4-core > nodes, 10-node cluster, thus 40 slots, fully configured across mappers and > reducers (40 slots of each). Say I run this program with just two mappers. > It would run much more efficiently (in essentially half the time) if I > could force the two mappers to go to slots on two separate machines instead > of running the risk that Hadoop may assign them both to the same machine. > > Can this be done? > > Thanks. > > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "Yet mark his perfect self-contentment, and hence learn his lesson, that > to be > self-contented is to be vile and ignorant, and that to aspire is better > than to > be blindly and impotently happy." > -- Edwin A. Abbott, Flatland > > ________________________________________________________________________________ > >