Yifan LI <[email protected]> writes:
> Maybe you could get the vertex, for instance, which id is 80, by using:
>
> graph.vertices.filter{case(id, _) => id==80}.collect
>
> but I am not sure this is the exactly efficient way.(it will scan the whole
> table? if it can not get benefit from index of VertexRDD table)
Until IndexedRDD is merged, a scan and collect is the best officially supported
way. PairRDDFunctions.lookup does this under the hood as well.
However, it's possible to use the VertexRDD's hash index to do a much more
efficient lookup. Note that these APIs may change, since VertexPartitionBase
and its subclasses are private[graphx].
You can access the partitions of a VertexRDD using VertexRDD#partitionsRDD, and
each partition has VertexPartitionBase#isDefined and VertexPartitionBase#apply.
Putting it all together:
val verts: VertexRDD[_] = ...
val targetVid: VertexId = 80L
val result = verts.partitionsRDD.flatMap { part =>
if (part.isDefined(targetVid)) Some(part(targetVid))
else None
}.collect.head
Once IndexedRDD [1] is merged, it will provide this functionality using
verts.get(targetVid). Its implementation of get also uses the hash partitioner
to run only one task [2].
Ankur
[1] https://issues.apache.org/jira/browse/SPARK-2365
[2]
https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89