Hi, I have some general scalability questions for Giraph. Based on the Giraph design, I am assuming all the mappers in giraph job should be running at the same time.
If so, then 1. The max mappers for giraph job <= total mapper slots in the whole cluster 2. The max data input size to giraph should be <= total mapper slots * mapper memory limit 3. If the total mapper slot in the cluster is 200 and only 100 mappers is currently available, and the giraph job require 150 mappers * Without any configuration change, the 100 mappers of the giraph will be started but the giraph job will NOT run successfully * Is there any configuration in Giraph to start the job ONLY at them time when all the mapper slot available? 4. How is the scalability in giraph? I can ONLY run up to 150 mappers for my giraph job. Does anyone run a large giraph job in large cluster successfully? * I am using giraph 0.1 in my cluster Thanks a lot for your time and inputs. Min