RE: Best way to know the assignment of vertices to workers

2014-11-28 Thread Pavan Kumar A
I looked at the code again & does not seem like workerList is sorted, etc. so by knowing a worker number there is no consistent way to tell the actual worker details each time. Lukas was working on such a diff sometime back. Perhaps he can answer more. From: pava...@outlook.com To: user@giraph.a

RE: Best way to know the assignment of vertices to workers

2014-11-28 Thread Pavan Kumar A
I wrote a diff sometime ago where you can easily do that. You can find implementation details at - https://issues.apache.org/jira/browse/GIRAPH-908 & https://reviews.apache.org/r/22234/ Some options you can use are -Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore

RE: Graph partitioning and data locality

2014-11-04 Thread Pavan Kumar A
You can also look at https://issues.apache.org/jira/browse/GIRAPH-908which solves the case where you have a partition map and would like graph to be partitioned that way after loading the input. It does not however solve the {do not shuffle data part} From: claudio.marte...@gmail.com Date: Tue,

RE: Using a custom graph partitioning stratergy with giraph

2014-10-01 Thread Pavan Kumar A
scenario. Thanks,Charith On Mon, Sep 29, 2014 at 3:34 PM, Pavan Kumar A wrote: we have two inputs - vertex & edgesif we partition edges vertices based on a map, then when we want to send messages we should be able to know which partition a vertex is on. typically we send message

RE: Using a custom graph partitioning stratergy with giraph

2014-09-29 Thread Pavan Kumar A
4 at 8:29 AM, Pavan Kumar A wrote: I worked on this feature sometime back - but I only worked on inputting hive file & not hdfs You can use logic outside giraph to select which partition file to use - this is possible because you input the number of workers anyway.For instance in the scr

RE: Graph re-partitioning

2014-09-29 Thread Pavan Kumar A
If you are using hashpartitioning, then as long as number of workers is same, partitions will remain unchanged, though they might run on a different worker. However, yes graph is always partitioned. Date: Mon, 29 Sep 2014 15:01:37 -0400 Subject: Graph re-partitioning From: xuhongne...@gmail.com T

RE: Using a custom graph partitioning stratergy with giraph

2014-09-28 Thread Pavan Kumar A
I worked on this feature sometime back - but I only worked on inputting hive file & not hdfs You can use logic outside giraph to select which partition file to use - this is possible because you input the number of workers anyway.For instance in the script that you use to launch a giraph job hav

RE: receiving messages that I didn't send

2014-09-23 Thread Pavan Kumar A
Can you give more context?What are the types of messages, patch of your compute method, etc.You will not receive messages that are not sent, but one thing that can happen is-- message can have multiple parameters.suppose message objects can have 2 parametersm - a,bsay in m's write(out) you do no

RE: looking for a User guide

2014-09-19 Thread Pavan Kumar A
http://www.manning.com/martella/I am not sure if there is any example in documentation Claudio might know more. From: khaled.am...@gmail.com Date: Fri, 19 Sep 2014 02:11:45 -0400 Subject: looking for a User guide To: user@giraph.apache.org Hi all, I was looking for a user guide for Giraph 1.0.0

RE: NegativeArraySizeException with large dataset

2014-09-09 Thread Pavan Kumar A
help! -- Andrew On Mon, Sep 8, 2014, at 05:31 PM, Pavan Kumar A wrote: ByteArrayEdges or any of the other edge stores used array based/ map based stores, all of these will encounter this exception when size of the array approaches Integer.MAX some things to consider for time being, wha

RE: NegativeArraySizeException with large dataset

2014-09-08 Thread Pavan Kumar A
ByteArrayEdges or any of the other edge stores used array based/ map based stores, all of these will encounter this exception when size of the array approaches Integer.MAXsome things to consider for time being, what do your edges look like?if they are long ids & null values u can use LongNullArr

RE: n-ary relationship on Giraph

2014-05-21 Thread Pavan Kumar A
ated'. 'date created' property belongs to A-> C-> B.Can I represent this in Giraph. Also does giraph has querying mechanism? So that I can retrieve triplets which are created before particular date? Sujan Perera On Wednesday, May 21, 2014 3:51 PM, Pavan Kumar A wrote: C

RE: n-ary relationship on Giraph

2014-05-21 Thread Pavan Kumar A
Can you please provide more context. vertex -> edge (edge value can store any properties required of that edge) -> vertex (vertex value can store any property required for the vertex) Date: Wed, 21 May 2014 13:50:34 -0700 From: sujanu...@yahoo.com Subject: n-ary relationship on Giraph To: user@gi

RE: input superstep of giraph.

2014-04-18 Thread Pavan Kumar A
.org/giraph-core/apidocs/org/apache/giraph/counters/GiraphTimers.html Thanks, Ghufran On Fri, Apr 18, 2014 at 3:25 PM, Pavan Kumar A wrote: I wrote the Initialize counter :) Please tell me if the name seems confusing So,Initialize = the time spent by job waiting for resources. In a shared

RE: input superstep of giraph.

2014-04-18 Thread Pavan Kumar A
look up the meanings there. https://giraph.apache.org/giraph-core/apidocs/org/apache/giraph/counters/GiraphTimers.html Thanks, Ghufran On Fri, Apr 18, 2014 at 3:25 PM, Pavan Kumar A wrote: I wrote the Initialize counter :) Please tell me if the name seems confusing So,Initialize =

RE: input superstep of giraph.

2014-04-18 Thread Pavan Kumar A
itialize (ms)=775 Setup (ms)=105 Shutdown (ms)=12537 Total (ms)=27075 Thanks, Ghufran On Thu, Apr 17, 2014 at 9:10 PM, Pavan Kumar A wrote: Input consists of > reading the input (vertices and/or edges as

RE: input superstep of giraph.

2014-04-17 Thread Pavan Kumar A
Input consists of > reading the input (vertices and/or edges as provided) into memory on individual workers> assigning vertices to partitions and partitions to workers> moving all partitions (i.e., vertices & their out-edges) to a worker (which owns the partition)> doing some bookkeeping of inte

RE: Optimal number of Workers

2014-04-16 Thread Pavan Kumar A
Giraph uses threads for compute, netty server, netty client on workers, execution pools, input, output etc.You can see most of these options in org.apache.giraph.conf.GiraphConstants for instance /** Netty client threads */ IntConfOption NETTY_CLIENT_THREADS = new IntConfOption("giraph.n

RE: Giraph Buffer Size

2014-04-16 Thread Pavan Kumar A
, Agrta Rawat On Wed, Apr 16, 2014 at 12:44 PM, Pavan Kumar A wrote: What do u mean by buffer size? Just as a note, please ensure that Xmx & Xms values are properly set for the mapper using mapred.child.java.opts or mapred.map.child.java.opts Also what does the error message show: please

RE: Changing index of a graph

2014-04-16 Thread Pavan Kumar A
It totally depends on the input distribution, one very simple thing that can be done is:> Define a VertexResolver that upon every vertex creation sets its Id = domain of url & value = "set" of urls in the domain; it keeps appending as more vertices with same id (i.e., domain) are read from inpu

RE: Can a vertex belong to more than one partition

2014-04-16 Thread Pavan Kumar A
partitioned and so the query graph should be > available to all partitions. Apart from this, some of the large graph > vertices(such as those which have edges between partitions) also have > to be duplicated. > > On Mon, Apr 7, 2014 at 9:53 PM, Pavan Kumar A wrote: > > If you wa

RE: Giraph Buffer Size

2014-04-16 Thread Pavan Kumar A
What do u mean by buffer size? Just as a note, please ensure that Xmx & Xms values are properly set for the mapper using mapred.child.java.opts or mapred.map.child.java.optsAlso what does the error message show: please use pastebin & post the link here. Date: Wed, 16 Apr 2014 12:13:29 +0530 Sub

RE: [Solved] Giraph job hangs indefinitely and is eventually killed by JobTracker

2014-04-07 Thread Pavan Kumar A
Hi Vikesh, It seems that you are trying to run benchmarks on giraph.We had a lot of improvements in 1.1.0-SNAPSHOT - (though it is not released publicly in maven at Facebook we run all our applications on the snapshot version)So, you can pull the latest trunk from giraph: git clone https://git-

RE: Can a vertex belong to more than one partition

2014-04-07 Thread Pavan Kumar A
If you want the vertex.value to be available to all vertices, then you can store it in an aggregator.A vertex can belong to exactly one partition. But please answer Lukas's questions so we can answer more appropriately. > Date: Mon, 7 Apr 2014 11:23:58 +0200 > From: lukas.naleze...@firma.seznam.

RE: clustering coefficient (counting triangles) in giraph.

2014-03-17 Thread Pavan Kumar A
If what you need is http://en.wikipedia.org/wiki/Clustering_coefficient#Local_clustering_coefficientthen I implemented it in Giraph, will submit a patch soon Date: Mon, 17 Mar 2014 15:33:07 -0400 Subject: Re: clustering coefficient (counting triangles) in giraph. From: kaushikpatn...@gmail.com T

RE: Running one compute function after another..

2014-01-11 Thread Pavan Kumar A
Jyoti - I recently did a similar thing. In fact, my approach was exactly what Maja suggested. However, there is a caveat. You can switch computation class for workers in mastercompute's compute method but that requires the messages sent by computation class active before switching and messages r

RE: Issues with Giraph v1.0.0

2013-12-13 Thread Pavan Kumar A
Hi Pankaj, Note that in Giraph, vertex is the first-class citizen, while edges are just data associated with a vertex.So, when you delete a vertex you delete all data associated with it i.e., its outgoing edges, its value, its id, etc. However, it is not trivial to delete all incoming edges to a

RE: vertex and data block co-location

2013-12-08 Thread Pavan Kumar A
@DavidYou can have a look at http://researcher.watson.ibm.com/researcher/files/us-ytian/giraph++.pdfThis work was done by http://researcher.watson.ibm.com/researcher/view.php?person=us-ytianIn this she talks about alternative partitioning schemes she implemented on top of giraph and the showst