Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError "GC overhead limit exceeded"

Ankur Dave Mon, 08 Sep 2014 23:29:18 -0700

At 2014-09-05 12:13:18 +0200, Yifan LI <iamyifa...@gmail.com> wrote:
> But how to assign the storage level to a new vertices RDD that mapped from
> an existing vertices RDD,
> e.g.
> *val newVertexRDD =
> graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId,
> a:Array[VertexId]) => (id, initialHashMap(a))}*
>
> the new one will be combined with that existing edges RDD(MEMORY_AND_DISK)
> to construct a new graph.
> e.g.
> val newGraph = Graph(newVertexRDD, graph.edges)


Sorry for the late reply. If you are constructing a graph from the derived 
VertexRDD, you can pass a desired storage level to the Graph constructor:

    val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map {
      case (id: VertexId, a: Array[VertexId]) => (id, initialHashMap(a))
    }
    val newGraph = Graph(
      newVertexRDD,
      graph.edges,
      edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
      vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)

For others reading, the reason why GraphX needs to be told the desired storage 
level is that it internally constructs temporary vertex or edge RDDs and uses 
them more than once, so it has to cache them to avoid recomputation.

> BTW, the return of newVertexRDD.getStorageLevel is StorageLevel(true, true,
> false, true, 1), what does it mean?

See the StorageLevel object [1]. This particular storage level corresponds to 
StorageLevel.MEMORY_AND_DISK.

Ankur

[1] 
https://github.com/apache/spark/blob/092e2f152fb674e7200cc8a2cb99a8fe0a9b2b33/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L147

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError "GC overhead limit exceeded"

Reply via email to