Dear Claudio Martella, I don't quite get what you mean. Our cluster has 15 servers each with 24 cores, so ideally there can be 15*24 threads/partitions work in parallel, right? (Perhaps deduct one for ZooKeeper)
However, when we set the "-Dgiraph.numComputeThreads" option, we find that we cannot have even 20 threads, and when set to 10, the CPU usage is just a little bit doubles that of the default setting, not anything close to 100*numComputeThreads%. How can we set it to work on our server to utilize all the processors? Regards, Da Yan > It actually depends on the setup of your cluster. > > Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node > (ideally to run giraph), so that you would have 14 workers, one per > computing node, plus one for master+zookeeper. Once that is reached, you > would have a number of compute threads equals to the number of threads > that > you can run on each node (24 in your case). > > Does this make sense to you? > > > On Thu, Oct 17, 2013 at 5:04 PM, Yi Lu <luyi0...@gmail.com> wrote: > >> Hi, >> >> I have a computer cluster consisting of 15 slave machines and 1 master >> machine. >> >> On each slave machine, there are two Xeon E5-2620 CPUs. With the help of >> HT, there are 24 threads. >> >> I am wondering how to specify parameters in order to run giraph job in >> parallel on my cluster. >> >> I am using the following parameters to run a pagerank algorithm. >> >> hadoop jar ~/giraph-examples.jar org.apache.giraph.GiraphRunner >> SimplePageRank -vif PageRankInputFormat -vip /input -vof >> PageRankOutputFormat -op /pagerank -w 1 -mc >> SimplePageRank\$SimplePageRankMasterCompute -wc >> SimplePageRank\$SimplePageRankWorkerContext >> >> In particular, >> >> 1)I know I can use “-w” to specify the number of workers. In my opinion, >> the number of workers equals to the number of mappers in hadoop except >> zookeeper. Therefore, in my case(15 slave machine), which number should >> be >> chosen? Is 15 a good choice? Since, I find if I input a large number, >> e.g. >> 100, the mappers will hang. >> >> 2)I know I can use “-Dgiraph.numComputeThreads=1” to specify vertex >> computing thread number. However, if I specify it to 10, the total >> runtime >> is much longer than default. I think the default is 1, which is found in >> the source code. I wonder if I want to use this parameter, which number >> should be chosen. >> >> 3)When the giraph job is running, I use “top” command to monitor my cpu >> usage on slave machines. I find that the java process can use 200%-300% >> cpu >> resource. However, if I change the number of vertex computing threads to >> 10, the java process can use 800% cpu resource. I think it is not a >> linear >> relation and I want to know why. >> >> >> Thanks for your help. >> >> Best, >> >> -Yi >> > > > > -- > Claudio Martella > claudio.marte...@gmail.com >