I am reading in an adjacency list using an input format which extends 
TextVertexInputFormat.  My code doesn’t do anything to address input splits, 
but leaves that to the underlying giraph implementation.  However it appears 
that as the data is being read 2 identical input splits are created and read 
in, resulting in edges for each vertex being created twice.

My input format is a simple adjacency list, where each node is represented by a 
single line of text which lists the node id, and all of its neighbors.
I read the edges into an edge list and then create the vertex via:
Vertex<Text, LouvainNodeState, LongWritable> vertex = 
this.getConf().createVertex();
vertex.initialize(id, state, edgesList);


Logs below show the edges being read in twice (as part of two different input 
splits in the input stage) and then being represented twice per node in the 
computation phase.
This example is using 1 compute thread and 1 worker.

If I am creating the vertex incorrectly or doing something else wrong please 
let me know.  Thanks.



Log snippet of vertex input process.

14/01/28 11:02:41 INFO worker.BspServiceWorker: loadInputSplits: Using 1 
thread(s), originally 1 threads(s) for 2 total splits.
14/01/28 11:02:41 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved 
input split path 
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0, 
overall roughly 0.0% input splits reserved
14/01/28 11:02:41 INFO worker.InputSplitsCallable: getInputSplit: Reserved 
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0 
from ZooKeeper and got input split 
'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/small:0+172'
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1

… other nodes processed

14/01/28 11:02:42 INFO worker.InputSplitsCallable: loadFromInputSplit: Finished 
loading 
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0 
(v=9, e=34)
14/01/28 11:02:42 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved 
input split path 
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1, 
overall roughly 50.0% input splits reserved
14/01/28 11:02:42 INFO worker.InputSplitsCallable: getInputSplit: Reserved 
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1 
from ZooKeeper and got input split 
'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/small:0+172'
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1

… other nodes processed again


Logs from the compute phase show that edges really are added twice  (format 
below shows edge #:target:weight)
While each node should only have one edge to each other, it instead has two.

4/01/28 11:02:42 INFO giraph.LouvainVertexComputation: NODE:  1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 1: 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 2: 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 3: 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 4: 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 5: 6:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 6: 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 7: 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 8: 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 9: 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 10: 6:1


Reply via email to