Hello,

I have the edges of a graph stored as parquet files (about 3GB). I am loading 
the graph and trying to compute the total number of triplets and triangles. 
Here is my code:

val edges_parq = sqlContext.read.option("header","true").parquet(args(0) + 
"/year=" + year) 
val edges: RDD[Edge[Int]] = edges_parq.rdd.map(row => 
Edge(row(0).asInstanceOf[Int].toInt, row(1).asInstanceOf[Int].toInt))
val graph = Graph.fromEdges(edges, 
1.toInt).partitionBy(PartitionStrategy.RandomVertexCut)

// The actual computation
var numberOfTriplets = graph.triplets.count
val tmp =  graph.triangleCount().vertices.filter{ case (vid, count) => count > 
0 }
var numberOfTriangles = tmp.map(a => a._2).sum()

Even though it manages to compute the number of triplets, I can’t compute the 
number of triangles. Every time I get an exception OOM - Java Heap Space on 
some executors and the application fails.
I am using 100 executors (1 core and 6GBs per executor). I have tried to use 
'hdfsConf.set("mapreduce.input.fileinputformat.split.maxsize", "33554432”)’ in 
the code but still no results.

Here are some of my configurations:
--conf spark.driver.memory=20G 
--conf spark.driver.maxResultSize=20G 
--conf spark.yarn.executor.memoryOverhead=6144 

- Thodoris
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to