Hello,
Great thanks for your reply. From the code I found that the reason why my
program will scan all the edges is becasue of the EdgeDirection I passed
into is EdgeDirection.Either.
However I still met the problem of Time consuming of each iteration will
not decrease by time. Thus I have two
In aggregateMessagesWithActiveSet, Spark still have to read all edges. It
means that a fixed time which scale with graph size is unavoidable on a
pregel-like iteration.
But what if I have to iterate nearly 100 iterations but at the last 50
iterations there are only 0.1% nodes need to be updated
Actually, GraphX doesn't need to scan all the edges, because it
maintains a clustered index on the source vertex id (that is, it sorts
the edges by source vertex id and stores the offsets in a hash table).
If the activeDirection is appropriately set, it can then jump only to
the clusters with
Hello,
The old api of GraphX mapReduceTriplets has an optional parameter
activeSetOpt: Option[(VertexRDD[_] that limit the input of sendMessage.
However, to the new api aggregateMessages I could not find this option,
why it does not offer any more?
Alcaid
We thought it would be better to simplify the interface, since the
active set is a performance optimization but the result is identical
to calling subgraph before aggregateMessages.
The active set option is still there in the package-private method
aggregateMessagesWithActiveSet. You can actually