Hi,

If you have heap problems in spark/graphx, it'd be better to split
partitions
into smaller ones so as to fit the partition on memory.

On Sat, Mar 14, 2015 at 12:09 AM, Hlib Mykhailenko <
hlib.mykhaile...@inria.fr> wrote:

> Hello,
>
> I cannot process graph with 230M edges.
> I cloned apache.spark, build it and then tried it on cluster.
>
> I used Spark Standalone Cluster:
> -5 machines (each has 12 cores/32GB RAM)
> -'spark.executor.memory' ==  25g
> -'spark.driver.memory' == 3g
>
>
> Graph has 231359027 edges. And its file weights 4,524,716,369 bytes.
> Graph is represented in text format:
> <source vertex id> <destination vertex id>
>
> My code:
>
> object Canonical {
>
>   def main(args: Array[String]) {
>
>     val numberOfArguments = 3
>     require(args.length == numberOfArguments, s"""Wrong argument number.
> Should be $numberOfArguments .
>
>  |Usage: <path_to_grpah> <partiotioner_name> <minEdgePartitions>
> """.stripMargin)
>
>     var graph: Graph[Int, Int] = null
>     val nameOfGraph = args(0).substring(args(0).lastIndexOf("/") + 1)
>     val partitionerName = args(1)
>     val minEdgePartitions = args(2).toInt
>
>     val sc = new SparkContext(new SparkConf()
>                        .setSparkHome(System.getenv("SPARK_HOME"))
>                        .setAppName(s" partitioning | $nameOfGraph |
> $partitionerName | $minEdgePartitions parts ")
>
>  .setJars(SparkContext.jarOfClass(this.getClass).toList))
>
>     graph = GraphLoader.edgeListFile(sc, args(0), false, edgeStorageLevel
> = StorageLevel.MEMORY_AND_DISK,
>                                                        vertexStorageLevel
> = StorageLevel.MEMORY_AND_DISK, minEdgePartitions = minEdgePartitions)
>     graph =
> graph.partitionBy(PartitionStrategy.fromString(partitionerName))
>     println(graph.edges.collect.length)
>     println(graph.vertices.collect.length)
>   }
> }
>
> After I run it I encountered number of java.lang.OutOfMemoryError: Java
> heap space errors and of course I did not get a result.
>
> Do I have problem in the code? Or in cluster configuration?
>
> Because it works fine for relatively small graphs. But for this graph it
> never worked. (And I do not think that 230M edges is too big data)
>
>
> Thank you for any advise!
>
>
>
> --
> Cordialement,
> *Hlib Mykhailenko*
> Doctorant à INRIA Sophia-Antipolis Méditerranée
> <http://www.inria.fr/centre/sophia/>
> 2004 Route des Lucioles BP93
> 06902 SOPHIA ANTIPOLIS cedex
>
>


-- 
---
Takeshi Yamamuro

Reply via email to