Re: Graph partitioning and data locality

Claudio Martella Tue, 04 Nov 2014 07:22:16 -0800

Hi,

answers are inline.


On Tue, Nov 4, 2014 at 8:36 AM, Martin Junghanns <martin.jungha...@gmx.net>
wrote:

> Hi group,
>
> I got a question concerning the graph partitioning step. If I understood
> the code correctly, the graph is distributed to n partitions by using
> vertexID.hashCode() & n. I got two questions concerning that step.
>
> 1) Is the whole graph loaded and partitioned only by the Master? This
> would mean, the whole data has to be moved to that Master map job and then
> moved to the physical node the specific worker for the partition runs on.
> As this sounds like a huge overhead, I further inspected the code:
> I saw that there is also a WorkerGraphPartitioner and I assume he calls
> the partitioning method on his local data (lets say his local HDFS blocks)
> and if the resulting partition for a vertex is not himself, the data gets
> moved to that worker, which reduces the overhead. Is this assumption
> correct?
>

That is correct, workers forward vertex data to the correct worker who is
responsible for that vertex via hash-partitioning (by default), meaning
that the master is not involved.


>
> 2) Let's say the graph is already partitioned in the file system, e.g.
> blocks on physical nodes contain logical connected graph nodes. Is it
> possible to just read the data as it is and skip the partitioning step? In
> that case I currently assume, that the vertexID should contain the
> partitionID and the custom partitioning would be an identity function in
> that case (instead of hashing or range).
>

In principle you can. You would need to organize splits so that they
contain all the data for each particular worker, and then assign relevant
splits to the corresponding worker.


>
> Thanks for your time and help!
>
> Cheers,
> Martin
>



-- 
   Claudio Martella

Re: Graph partitioning and data locality

Reply via email to