At 2014-09-05 12:13:18 +0200, Yifan LI <[email protected]> wrote:
> But how to assign the storage level to a new vertices RDD that mapped from
> an existing vertices RDD,
> e.g.
> *val newVertexRDD =
> graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId,
> a:Array[VertexId]) => (id, initialHashMap(a))}*
>
> the new one will be combined with that existing edges RDD(MEMORY_AND_DISK)
> to construct a new graph.
> e.g.
> val newGraph = Graph(newVertexRDD, graph.edges)
Sorry for the late reply. If you are constructing a graph from the derived
VertexRDD, you can pass a desired storage level to the Graph constructor:
val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map {
case (id: VertexId, a: Array[VertexId]) => (id, initialHashMap(a))
}
val newGraph = Graph(
newVertexRDD,
graph.edges,
edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)
For others reading, the reason why GraphX needs to be told the desired storage
level is that it internally constructs temporary vertex or edge RDDs and uses
them more than once, so it has to cache them to avoid recomputation.
> BTW, the return of newVertexRDD.getStorageLevel is StorageLevel(true, true,
> false, true, 1), what does it mean?
See the StorageLevel object [1]. This particular storage level corresponds to
StorageLevel.MEMORY_AND_DISK.
Ankur
[1]
https://github.com/apache/spark/blob/092e2f152fb674e7200cc8a2cb99a8fe0a9b2b33/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L147
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]