Hi,

I have some general scalability questions for Giraph. Based on the Giraph 
design, I am assuming all the mappers in giraph job should be running at the 
same time.

If so, then

  1.  The max mappers for giraph job <= total mapper slots in the whole cluster
  2.  The max data input size to giraph should be <= total mapper slots * 
mapper memory limit
  3.  If the total mapper slot in the cluster is 200 and only 100 mappers is 
currently available, and the giraph job require 150 mappers
     *   Without any configuration change, the 100 mappers of the giraph will 
be started but the giraph job will NOT run successfully
     *   Is there any configuration in Giraph to start the job ONLY at them 
time when  all the mapper slot available?
  4.  How is the scalability in giraph? I can ONLY run up to 150 mappers for my 
giraph job. Does anyone run a large giraph job in large cluster successfully?
     *   I am using giraph 0.1 in my cluster

Thanks a lot for your time and inputs.

Min

Reply via email to