At 2014-09-05 12:13:18 +0200, Yifan LI <iamyifa...@gmail.com> wrote: > But how to assign the storage level to a new vertices RDD that mapped from > an existing vertices RDD, > e.g. > *val newVertexRDD = > graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId, > a:Array[VertexId]) => (id, initialHashMap(a))}* > > the new one will be combined with that existing edges RDD(MEMORY_AND_DISK) > to construct a new graph. > e.g. > val newGraph = Graph(newVertexRDD, graph.edges)
Sorry for the late reply. If you are constructing a graph from the derived VertexRDD, you can pass a desired storage level to the Graph constructor: val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map { case (id: VertexId, a: Array[VertexId]) => (id, initialHashMap(a)) } val newGraph = Graph( newVertexRDD, graph.edges, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK, vertexStorageLevel = StorageLevel.MEMORY_AND_DISK) For others reading, the reason why GraphX needs to be told the desired storage level is that it internally constructs temporary vertex or edge RDDs and uses them more than once, so it has to cache them to avoid recomputation. > BTW, the return of newVertexRDD.getStorageLevel is StorageLevel(true, true, > false, true, 1), what does it mean? See the StorageLevel object [1]. This particular storage level corresponds to StorageLevel.MEMORY_AND_DISK. Ankur [1] https://github.com/apache/spark/blob/092e2f152fb674e7200cc8a2cb99a8fe0a9b2b33/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L147 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org