I am getting Out of Memory error while running connectedComponents job on
graph with around 12000 vertices and 134600 edges.
I am running spark in embedded mode in a standalone Java application and
have tried to increase the memory but it seems that its not taking any
effect
sparkConf = new SparkConf().setAppName("SOME APP
NAME").setMaster("local[10]")
.set("spark.executor.memory","5g")
.set("spark.driver.memory","8g")
.set("spark.driver.maxResultSize","1g")
.set("spark.sql.warehouse.dir", "file:///d:/spark/tmp")
.set("hadoop.home.dir", "file:///D:/spark-2.1.0-bin-hadoop2.7/bin");
spark = SparkSession.builder().config(sparkConf).getOrCreate();
spark.sparkContext().setLogLevel("ERROR");
spark.sparkContext().setCheckpointDir("D:/spark/tmp");
the stack trace
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at scala.StringContext.standardInterpolator(StringContext.scala:126)
at scala.StringContext.s(StringContext.scala:95)
at
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:230)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2390)
at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2390)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$collect(Dataset.scala:2390)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2366)
at
org.graphframes.lib.ConnectedComponents$.skewedJoin(ConnectedComponents.scala:239)
at
org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ConnectedComponents$$run(ConnectedComponents.scala:308)
at
org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:139)
GraphFrame version is 0.5.0 and Spark version is 2.1.1
regards,
Imran
--
I.R