RE: Graph partitioning and data locality

Pavan Kumar A Tue, 04 Nov 2014 07:28:59 -0800

You can also look at https://issues.apache.org/jira/browse/GIRAPH-908which 
solves the case where you have a partition map and would like graph to be 
partitioned that way after loading the input. It does not however solve the {do 
not shuffle data part}

From: claudio.marte...@gmail.com
Date: Tue, 4 Nov 2014 16:20:21 +0100
Subject: Re: Graph partitioning and data locality
To: user@giraph.apache.org

Hi,
answers are inline.
On Tue, Nov 4, 2014 at 8:36 AM, Martin Junghanns <martin.jungha...@gmx.net> 
wrote:
Hi group,

I got a question concerning the graph partitioning step. If I understood the 
code correctly, the graph is distributed to n partitions by using 
vertexID.hashCode() & n. I got two questions concerning that step.

1) Is the whole graph loaded and partitioned only by the Master? This would 
mean, the whole data has to be moved to that Master map job and then moved to 
the physical node the specific worker for the partition runs on. As this sounds 
like a huge overhead, I further inspected the code:

I saw that there is also a WorkerGraphPartitioner and I assume he calls the 
partitioning method on his local data (lets say his local HDFS blocks) and if 
the resulting partition for a vertex is not himself, the data gets moved to 
that worker, which reduces the overhead. Is this assumption correct?

That is correct, workers forward vertex data to the correct worker who is 
responsible for that vertex via hash-partitioning (by default), meaning that 
the master is not involved. 

2) Let's say the graph is already partitioned in the file system, e.g. blocks 
on physical nodes contain logical connected graph nodes. Is it possible to just 
read the data as it is and skip the partitioning step? In that case I currently 
assume, that the vertexID should contain the partitionID and the custom 
partitioning would be an identity function in that case (instead of hashing or 
range).

In principle you can. You would need to organize splits so that they contain 
all the data for each particular worker, and then assign relevant splits to the 
corresponding worker. 

Thanks for your time and help!

Cheers,

Martin

-- 
    Claudio Martella

RE: Graph partitioning and data locality

Reply via email to