Hi Ghufran, Glad to hear that works. I believe you wanted to say you converted the original edge list (which was directed) to adjacency list in undirected graph format, right?
Thanks, Jae From: ghufran malik [mailto:ghufran1ma...@gmail.com] Sent: Thursday, April 17, 2014 8:12 AM To: user@giraph.apache.org Subject: Re: Running ConnectedComponents in a cluster. Hi Jae, Thanks so much for pointing out that it wasn't directed. I made the changes and made a directed graph and connected components now works :) Thanks, Ghufran On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <jaewook...@intel.com<mailto:jaewook...@intel.com>> wrote: Ghufran, The Youtube community dataset (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>) [1] is formatted as directed graph although the description says it’s undirected graph. With some minor changes in your conversion program, you should be able to generated a proper undirected adjacency list. Hope this will help. Thanks, Jae [1] https://snap.stanford.edu/data/com-Youtube.html From: Yu, Jaewook [mailto:jaewook...@intel.com<mailto:jaewook...@intel.com>] Sent: Wednesday, April 16, 2014 11:00 AM To: user@giraph.apache.org<mailto:user@giraph.apache.org> Subject: RE: Running ConnectedComponents in a cluster. Hi Ghufran, Have you verified the neighbors of each vertex actually exist? From your adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex id 532613 valid? Thanks, Jae From: ghufran malik [mailto:ghufran1ma...@gmail.com] Sent: Wednesday, April 16, 2014 9:22 AM To: user@giraph.apache.org<mailto:user@giraph.apache.org> Subject: Running ConnectedComponents in a cluster. Hi, I have setup Giraph on my university cluster of computers (Giraph 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the connected components algorithm on a very small test dataset using 4 workers and it produced the expected output. dataset: vertex id, vertex value, neighbours.... 0 0 1 1 1 0 2 3 2 2 1 3 3 3 1 2 output: 1 0 0 0 3 0 2 0 However when I tried to run this algorithm on a larger dataset (reformatted version of com-youtube.ungraph from Stanford snap to match the IntIntNullTextVertexInputFormat) it successfully complets but the incorrect output is produced. It seems to just output the vertex id with its orignal value (its vertex id is its original value that i set). A snippet of the dataset is provided: vertex id, vertex value, neighbours.... ....... 278447 278447 532613 278449 278449 305447 324115 414238 83899 83899 153460 172614 176613 211448 773749 773749 845366 773748 773748 960388 ....... output produced: ............. 73132 73132<tel:73132%C2%A0%C2%A0%C2%A0%2073132> 831308 831308 199788 199788 763644 763644 300572 300572 ............. there's not one vertex value that isn't the same as its original vertex ID. The computation also stops after superstep 0 is done and goes no further, whereas on my smaller data set completes 3 supersteps. Does anyone have an idea to why this is? Kind regards, Ghufran