Re: Giraph Partitioning

Matthew Saltz Wed, 25 Feb 2015 03:31:42 -0800

Hi,

1) The partitions are processed in parallel based on the number of threads
you specify. The vertices within a partition are processed sequentially.
You may want to use more partitions than threads, that way if one partition
takes a particularly long time to be processed, the other threads can
continue processing the remaining partitions. If you have four machines
with 12 threads each for example, with one worker per machine, the default
number of partitions will be 4^2 = 16 partitions, whereas you actually have
48 threads available, so you'd probably want to specify the number of
partitions manually to a larger number to take advantage of parallelism.
2) Yes
3) If you are only doing single threading, there's no reason to do multiple
partitions per worker
3 (the second one)) I'm not familiar with the out-of-core functionality
4) I'm not sure


I'm basing this on the version of Giraph from this summer, not the most
recent release, but I don't think this part has changed. May want to verify
by looking at the code.

Best,
Matthew

On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma <as469...@gmail.com> wrote:

> Hi,
>
> I understand that by default, the number of partitions = number of workers
> ^ 2. So, if we have N workers, each worker will process N partitions. I
> have a number of questions:
>
> 1- By default, does Giraph process the N partitions within a single worker
> sequentially? If yes, when setting the parameter giraph.numComputeThreads,
> will partitions within each thread be computed sequentially?
>
> 2- By default, does Giraph keep all partitions in memory?
>
> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from
> using multiple partitions versus a single partition in the case of single
> threading per worker?
>
> 3- How does the out-of-core partitions affect out-of-core messages? Are
> they completely independent? For example, if the number of partitions to be
> kept in memory is set to a number less than N, and at the same time all
> messages are set to be kept in memory, will ALL messages be kept in memory,
> even those from out-of-core partitions? If the situation is reversed, where
> all partitions are kept in memory, and out-of-core messaging is set, will
> messages from memory-based partitions be saved on disk?
>
> 4- Is there a class like a PartitionContext, where you can access
> preSuperstep and postSuperstep *per partition*, along the lines of
> WorkerContext?
>
>

Re: Giraph Partitioning

Reply via email to