Re: [GraphX] aggregateMessages with active set

2015-04-13 Thread James
Hello, Great thanks for your reply. From the code I found that the reason why my program will scan all the edges is becasue of the EdgeDirection I passed into is EdgeDirection.Either. However I still met the problem of Time consuming of each iteration will not decrease by time. Thus I have two

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread James
In aggregateMessagesWithActiveSet, Spark still have to read all edges. It means that a fixed time which scale with graph size is unavoidable on a pregel-like iteration. But what if I have to iterate nearly 100 iterations but at the last 50 iterations there are only 0.1% nodes need to be updated

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread Ankur Dave
Actually, GraphX doesn't need to scan all the edges, because it maintains a clustered index on the source vertex id (that is, it sorts the edges by source vertex id and stores the offsets in a hash table). If the activeDirection is appropriately set, it can then jump only to the clusters with

[GraphX] aggregateMessages with active set

2015-04-07 Thread James
Hello, The old api of GraphX mapReduceTriplets has an optional parameter activeSetOpt: Option[(VertexRDD[_] that limit the input of sendMessage. However, to the new api aggregateMessages I could not find this option, why it does not offer any more? Alcaid

Re: [GraphX] aggregateMessages with active set

2015-04-07 Thread Ankur Dave
We thought it would be better to simplify the interface, since the active set is a performance optimization but the result is identical to calling subgraph before aggregateMessages. The active set option is still there in the package-private method aggregateMessagesWithActiveSet. You can actually