I would guess the same, but I don't know for sure. --Steve
On Wed, Apr 29, 2015 at 12:21 PM, Arjun Sharma <as469...@gmail.com> wrote: > Hi Steven, > > Thank you so much for your detailed reply! Actually, my second question > was about if we do not set mapreduce.map.cpu.vcores (defaults to 1) or > yarn.nodemanager.resource.cpu-vcores (defaults to 8), while we set > giraph.numComputeThreads (say to 16). I expect every worker will run 16 > threads on 1 core, but wanted to see if you have the same understanding. > > Thanks, > Arjun. > > On Wed, Apr 29, 2015 at 8:50 AM, Steven Harenberg <sdhar...@ncsu.edu> > wrote: > >> Hey Arjun, >> >> I am glad someone finally responded to this thread. I am surprised no one >> else is trying to figure out these configuration settings... >> >> Here is my understanding of your questions (though I am not sure they are >> right): >> >> >> *Is setting both mapreduce.map.cpu.vcores and >> yarn.nodemanager.resource.cpu-vcores is required?* >> >> Yes, I believe you need both of these set or else they will revert to >> default values. Importantly, I think you should set these to the same value >> so that you spawn one mapper/giraph-worker per machine (as this was said to >> be optimal). >> >> Since I have 32 cores per machine, I have set both these values to 32 and >> has worked to only spawn one worker per machine (unless I try to have a >> worker share a machine with the master). >> >> Check this page out: >> http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ >> >> >> *What happens if they are not set, while giraph.numComputeThreads is set?* >> >> The above parameters specify how many nodes per machine you are allowing >> for workers AND how many cores one worker will use. If you don't set >> *giraph.numComputeThreads >> *then the worker will use the default number (I think that is 1) despite >> possibly being allocated more cores. Hence, I set *giraph.numComputeThreads, >> **giraph.numInputThreads, *and *giraph.numOutputThreads *to be the same >> as the above two paramters, the total cores in one machine (for me 32). >> >> Giraph is never going to fully utilize the entire machine, so I don't >> think its really possible to tell if these are correct settings, but all of >> this seems reasonable based on my experience and how these parameters are >> defined. >> >> >> >> *Are there any other parameters that must be set in order to make sure we >> are *really* using the cores, not just multi-threading on a single core?* >> >> No idea, but the above parameters and some memory configurations are all >> I set. The memory configurations are worse in my opinion, as I was running >> into memory issues and ended up having to manually set the following >> parameters: >> >> - yarn.nodemanager.resource.memory-mb >> - yarn.scheduler.minimum-allocation-mb >> - yarn.scheduler.maximum-allocation-mb >> - mapreduce.map.memory.mb >> - -yh (in Giraph arguments) >> >> All of these were required to be manually set to get Giraph to run >> without having memory issues. >> >> Best regards, >> Steve >> >> >> On Thu, Apr 23, 2015 at 8:15 PM, Arjun Sharma <as469...@gmail.com> wrote: >> >>> Just bumping up this thread, as I am having the same question as >>> Steven's. >>> >>> Steven, did you get to know if setting both mapreduce.map.cpu.vcores >>> and yarn.nodemanager.resource.cpu-vcores is required? What happens if >>> they are not set, while giraph.numComputeThreads is set? Are there any >>> other parameters that must be set in order to make sure we are *really* >>> using the cores, not just multi-threading on a single core? >>> >>> >>> On Wed, Mar 18, 2015 at 11:48 AM, Steven Harenberg <sdhar...@ncsu.edu> >>> wrote: >>> >>>> Hi all, >>>> >>>> Previously with MapReduceV1, the suggestion was to have a 1:1 >>>> correspondence between workers and compute nodes (machines) and set the >>>> number of the threads to be the number of cores per machines. To achieve >>>> this configuration, we would set "mapred.tasktracker.map.tasks.maximum=1". >>>> Since workers correspond to mappers this would ensure there was one worker >>>> per machine. >>>> >>>> Now I am reading that with Yarn this property longer exists as there >>>> aren't tasktrackers. Instead, we have the global properties >>>> "yarn.nodemanager.resource.cpu-vcores", which specifies the cores _per >>>> node_, and the property "mapreduce.map.cpu.vcores", which specifies the >>>> cores _per map task_. >>>> >>>> If we want to have one mapper per node that is fully utilizing the >>>> machine, I assume we should just set mapreduce.map.cpu.vcores = >>>> yarn.nodemanager.resource.cpu-vcores = the # of cores per node. Is this >>>> correct? >>>> >>>> Do I still need to set giraph.numComputeThreads to be the number of >>>> cores per node? >>>> >>>> Thanks, >>>> Steve >>>> >>> >>> >> >