Hello, I have the edges of a graph stored as parquet files (about 3GB). I am loading the graph and trying to compute the total number of triplets and triangles. Here is my code:
val edges_parq = sqlContext.read.option("header","true").parquet(args(0) + "/year=" + year) val edges: RDD[Edge[Int]] = edges_parq.rdd.map(row => Edge(row(0).asInstanceOf[Int].toInt, row(1).asInstanceOf[Int].toInt)) val graph = Graph.fromEdges(edges, 1.toInt).partitionBy(PartitionStrategy.RandomVertexCut) // The actual computation var numberOfTriplets = graph.triplets.count val tmp = graph.triangleCount().vertices.filter{ case (vid, count) => count > 0 } var numberOfTriangles = tmp.map(a => a._2).sum() Even though it manages to compute the number of triplets, I can’t compute the number of triangles. Every time I get an exception OOM - Java Heap Space on some executors and the application fails. I am using 100 executors (1 core and 6GBs per executor). I have tried to use 'hdfsConf.set("mapreduce.input.fileinputformat.split.maxsize", "33554432”)’ in the code but still no results. Here are some of my configurations: --conf spark.driver.memory=20G --conf spark.driver.maxResultSize=20G --conf spark.yarn.executor.memoryOverhead=6144 - Thodoris --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org