Re: Multiple jobs on same graph, aggregator use and LocalRunner issue

Benjamin Heitmann Tue, 05 Jun 2012 19:10:44 -0700

Hi Clive, 

On 5 Jun 2012, at 22:21, Clive Cox wrote:
> 
> I recently started playing with Giraph and I have a few questions.
> 
> 1. I'm writing a simple spreading activation algorithm


I am also working on a spreading activation algorithm. 
My original data is in the form of an RDF graph, which has typed edges and 
vertices, 
which is pretty far away from the kind of pagerank algorithm for which Google 
Pregel
and thus Apache Giraph is optimised for. 

So I can understand your questions very well. 


> which would be
> run many times over the same graph with different initial vertices
> activated. Doing this as separate jobs in which a potentially large
> graph is loaded each time will be slow. Is there a way to run multiple
> BSP runs over the same loaded graph? 

Sadly this is not possible currently AFAIK. The Hadoop paradigm is focused on 
on jobs with a transient graph. 

But I think if enough people speak up to point out how ineffecient it is to 
just throw away the graph between jobs, 
maybe some sort of mechanism can be added for running the same algorithm with 
different "configurations"
on the same graph. 

I need to run the same algorithm on the same graph for different user profiles 
("different configurations"), 
and it was a big challenge to run all of those configurations in parallel in 
just one run. For my case, 
building the graph takes between 1/3 and 1/4 of the total processing time

> 2. I might want to normalise the vertex values at the end of a
> superstep. I assume I can use an aggregator to get the sum of the values
> but I'm not sure where can I update all vertex values before the next
> superstep?

The best place right now to add some coordinating logic based on a knowledge 
about the whole graph, 
is in the WorkerContext, specifically in the pre-superstep method. 

In the compute method of a vertex, you can add a value to a Sum/LongSum 
Aggregator.
Then in the pre-superstep method of the WorkerContext you can check the value 
of that aggregator. 
Then you can either re-set that same aggregator, or you can set another 
aggregator. Then in the next superstep
the vertices will need to check that aggregator and retrieve the new normalised 
value.

Somebody started to work on a patch for a centralised master which will be able 
to control/coordinate the whole graph, 
but nothing has been finished for that. The Jira issue is here: 
https://issues.apache.org/jira/browse/GIRAPH-127

> 3. On a smaller trivial point: Running within a LocalRunner for
> debugging I need to delete the local zookeeper state created in _bsp*
> folders otherwise the next run does nothing as its assumes its the same
> state and just finishes straight away. 

I never had that issue, so I cant comment on that.

Re: Multiple jobs on same graph, aggregator use and LocalRunner issue

Reply via email to