Just bumping up this thread, as I am having the same question as Steven's. Steven, did you get to know if setting both mapreduce.map.cpu.vcores and yarn.nodemanager.resource.cpu-vcores is required? What happens if they are not set, while giraph.numComputeThreads is set? Are there any other parameters that must be set in order to make sure we are *really* using the cores, not just multi-threading on a single core?
On Wed, Mar 18, 2015 at 11:48 AM, Steven Harenberg <sdhar...@ncsu.edu> wrote: > Hi all, > > Previously with MapReduceV1, the suggestion was to have a 1:1 > correspondence between workers and compute nodes (machines) and set the > number of the threads to be the number of cores per machines. To achieve > this configuration, we would set "mapred.tasktracker.map.tasks.maximum=1". > Since workers correspond to mappers this would ensure there was one worker > per machine. > > Now I am reading that with Yarn this property longer exists as there > aren't tasktrackers. Instead, we have the global properties > "yarn.nodemanager.resource.cpu-vcores", which specifies the cores _per > node_, and the property "mapreduce.map.cpu.vcores", which specifies the > cores _per map task_. > > If we want to have one mapper per node that is fully utilizing the > machine, I assume we should just set mapreduce.map.cpu.vcores = > yarn.nodemanager.resource.cpu-vcores = the # of cores per node. Is this > correct? > > Do I still need to set giraph.numComputeThreads to be the number of cores > per node? > > Thanks, > Steve >