hi, all,
We deploy sparksql in standalone mode without HDFS on 1 machine with 256G
RAM and 64 cores.
The spark session props like below:
SparkSession.builder().appName("MYAPP")
> .config("spark.sql.crossJoin.enabled", "true")
> .config("spark.executor.memory", this.memory_limit)
> .config("spark.executor.cores", 2)
> .config("spark.driver.memory", "2g")
> .config("spark.storage.memoryFraction", 0.3)
> .config("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
> .config("spark.executor.extraJavaOptions",
> "-XX:+UseG1GC -XX:+PrintFlagsFinal
> -XX:+PrintReferenceGC " +
> "-verbose:gc -XX:+PrintGCDetails " +
> "-XX:+PrintGCTimeStamps
> -XX:+PrintAdaptiveSizePolicy")
> .master(this.spark_master)
> .getOrCreate();
The MySQL JDBC connection props like below:
Properties connProp = new Properties();
> connProp.put("driver", "com.mysql.jdbc.Driver");
> connProp.put("useSSL", "false");
> connProp.put("user", this.user);
> connProp.put("password", this.password);
The we register the MySQL table as Dataset :
Dataset<Row> jdbcDF1 = ss.read().jdbc(this.url, "(select * from bigtable)
> t1", connProp);
> jdbcDF1.createOrReplaceTempView("t1");
Dataset<Row> jdbcDF2 = ss.read().jdbc(this.url, "(select * from smalltable)
> t2", connProp);
> jdbcDF2.createOrReplaceTempView("t2");
> Dataset<Row> result = sparksession.sql("select * from t1, t2 where xxxx");
When run the job, we got the OOM error in our java program:
> Lost task 6.0 in stage 1156.0 (TID 16686, 172.16.50.103, executor 5):
> java.lang.OutOfMemoryError: Java heap space
> at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2213)
> at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1992)
> at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3413)
> at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:471)
> at
> com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3115)
> at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2344)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2739)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2486)
> at
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at
> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
If there something configuration wrong ? how to fix that? will sparksql use
disk when memory not enough?