Hi,
I implemented a benchmark that allows me to generate an arbitrarily
large graph (depending on the number of iterations). Now I would like to
configure Giraph so that I can make the best use of my hardware for this
benchmark. Based on the number of nodes in my cluster, their amount of
main memory and number of cores, I am asking myself how do I determine
the optimal parameters of Giraph / Hadoop, specifically:
- the number of used mappers
- the HEAP_SIZE environment variable
- the memory specified in the mapred.map.child.java.opts property
(any other relevant parameters?)
Also, I was wondering how well Giraph can handle computations which
start with a very small graph and mutate it to a very large one. For
example, if I understand correctly the number of mappers is not
dynamically adjusted.
Any hints (or links to documentation) are highly appreciated.
Cheers,
Christian