Giraph Partitioning

Arjun Sharma Tue, 24 Feb 2015 18:26:01 -0800

Hi,

I understand that by default, the number of partitions = number of workers
^ 2. So, if we have N workers, each worker will process N partitions. I
have a number of questions:


1- By default, does Giraph process the N partitions within a single worker
sequentially? If yes, when setting the parameter giraph.numComputeThreads,
will partitions within each thread be computed sequentially?

2- By default, does Giraph keep all partitions in memory?

3- If the answers to 1 and 2 are yes and yes, is there any advantage from
using multiple partitions versus a single partition in the case of single
threading per worker?

3- How does the out-of-core partitions affect out-of-core messages? Are
they completely independent? For example, if the number of partitions to be
kept in memory is set to a number less than N, and at the same time all
messages are set to be kept in memory, will ALL messages be kept in memory,
even those from out-of-core partitions? If the situation is reversed, where
all partitions are kept in memory, and out-of-core messaging is set, will
messages from memory-based partitions be saved on disk?

4- Is there a class like a PartitionContext, where you can access
preSuperstep and postSuperstep *per partition*, along the lines of
WorkerContext?

Giraph Partitioning

Reply via email to