Hi, If you have heap problems in spark/graphx, it'd be better to split partitions into smaller ones so as to fit the partition on memory.
On Sat, Mar 14, 2015 at 12:09 AM, Hlib Mykhailenko < hlib.mykhaile...@inria.fr> wrote: > Hello, > > I cannot process graph with 230M edges. > I cloned apache.spark, build it and then tried it on cluster. > > I used Spark Standalone Cluster: > -5 machines (each has 12 cores/32GB RAM) > -'spark.executor.memory' == 25g > -'spark.driver.memory' == 3g > > > Graph has 231359027 edges. And its file weights 4,524,716,369 bytes. > Graph is represented in text format: > <source vertex id> <destination vertex id> > > My code: > > object Canonical { > > def main(args: Array[String]) { > > val numberOfArguments = 3 > require(args.length == numberOfArguments, s"""Wrong argument number. > Should be $numberOfArguments . > > |Usage: <path_to_grpah> <partiotioner_name> <minEdgePartitions> > """.stripMargin) > > var graph: Graph[Int, Int] = null > val nameOfGraph = args(0).substring(args(0).lastIndexOf("/") + 1) > val partitionerName = args(1) > val minEdgePartitions = args(2).toInt > > val sc = new SparkContext(new SparkConf() > .setSparkHome(System.getenv("SPARK_HOME")) > .setAppName(s" partitioning | $nameOfGraph | > $partitionerName | $minEdgePartitions parts ") > > .setJars(SparkContext.jarOfClass(this.getClass).toList)) > > graph = GraphLoader.edgeListFile(sc, args(0), false, edgeStorageLevel > = StorageLevel.MEMORY_AND_DISK, > vertexStorageLevel > = StorageLevel.MEMORY_AND_DISK, minEdgePartitions = minEdgePartitions) > graph = > graph.partitionBy(PartitionStrategy.fromString(partitionerName)) > println(graph.edges.collect.length) > println(graph.vertices.collect.length) > } > } > > After I run it I encountered number of java.lang.OutOfMemoryError: Java > heap space errors and of course I did not get a result. > > Do I have problem in the code? Or in cluster configuration? > > Because it works fine for relatively small graphs. But for this graph it > never worked. (And I do not think that 230M edges is too big data) > > > Thank you for any advise! > > > > -- > Cordialement, > *Hlib Mykhailenko* > Doctorant à INRIA Sophia-Antipolis Méditerranée > <http://www.inria.fr/centre/sophia/> > 2004 Route des Lucioles BP93 > 06902 SOPHIA ANTIPOLIS cedex > > -- --- Takeshi Yamamuro