We submit spark job through spark-submit command, Like below one.
sudo /var/lib/pf-spark/bin/spark-submit \ --total-executor-cores 30 \ --driver-cores 2 \ --class com.hrishikesh.mishra.Main\ --master spark://XX.XX.XXX.19:6066 \ --deploy-mode cluster \ --supervise http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar We have python http server, where we hosted all jars. The user kill the driver driver-20200508153502-1291 and its visible in log also, but this is not problem. OOM is separate from this. 20/05/08 15:36:55 INFO Worker: Asked to kill driver driver-20200508153502-1291 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! 20/05/08 15:36:55 INFO CommandUtils: Redirection to /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed 20/05/08 15:36:55 INFO CommandUtils: Redirection to /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application app-20200508153654-11776 removed, cleanupLocalDirs = true 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was killed by user* *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:* *java.lang.OutOfMemoryError: Java heap space* *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dispatcher-event-loop-6,5,main]* *java.lang.OutOfMemoryError: Java heap space* *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:* *java.lang.OutOfMemoryError: Java heap space* 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922 On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > It's been a while since I worked with Spark Standalone, but I'd check the > logs of the workers. How do you spark-submit the app? > > DId you check /grid/1/spark/work/driver-20200508153502-1291 directory? > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > "The Internals Of" Online Books <https://books.japila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com> > wrote: > >> Thanks Jacek for quick response. >> Due to our system constraints, we can't move to Structured Streaming now. >> But definitely YARN can be tried out. >> >> But my problem is I'm able to figure out where is the issue, Driver, >> Executor, or Worker. Even exceptions are clueless. Please see the below >> exception, I'm unable to spot the issue for OOM. >> >> 20/05/08 15:36:55 INFO Worker: Asked to kill driver >> driver-20200508153502-1291 >> >> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! >> >> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed >> >> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed >> >> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application >> app-20200508153654-11776 removed, cleanupLocalDirs = true >> >> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was >> killed by user >> >> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception >> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >> stacktrace] was thrown by a user handler's exceptionCaught() method while >> handling the following exception:* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught >> exception in thread Thread[dispatcher-event-loop-6,5,main]* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception >> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >> stacktrace] was thrown by a user handler's exceptionCaught() method while >> handling the following exception:* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called >> >> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory >> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922 >> >> >> >> >> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote: >> >>> Hi, >>> >>> Sorry for being perhaps too harsh, but when you asked "Am I missing >>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone >>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to >>> use Spark Structured Streaming at the very least and/or use YARN as the >>> cluster manager". >>> >>> Another thought was that the user code (your code) could be leaking >>> resources so Spark eventually reports heap-related errors that may not >>> necessarily be Spark's. >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://about.me/JacekLaskowski >>> "The Internals Of" Online Books <https://books.japila.pl/> >>> Follow me on https://twitter.com/jaceklaskowski >>> >>> <https://twitter.com/jaceklaskowski> >>> >>> >>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> I am getting out of memory error in worker log in streaming jobs in >>>> every couple of hours. After this worker dies. There is no shuffle, no >>>> aggression, no. caching in job, its just a transformation. >>>> I'm not able to identify where is the problem, driver or executor. And >>>> why worker getting dead after the OOM streaming job should die. Am I >>>> missing something. >>>> >>>> Driver Memory: 2g >>>> Executor memory: 4g >>>> >>>> Spark Version: 2.4 >>>> Kafka Direct Stream >>>> Spark Standalone Cluster. >>>> >>>> >>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication >>>> disabled; ui acls disabled; users with view permissions: Set(root); groups >>>> with view permissions: Set(); users with modify permissions: Set(root); >>>> groups with modify permissions: Set() >>>> >>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught >>>> exception in thread Thread[ExecutorRunner for >>>> app-20200506124717-10226/0,5,main] >>>> >>>> java.lang.OutOfMemoryError: Java heap space >>>> >>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source) >>>> >>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source) >>>> >>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source) >>>> >>>> at >>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown >>>> Source) >>>> >>>> at >>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown >>>> Source) >>>> >>>> at >>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown >>>> Source) >>>> >>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>>> >>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>>> >>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) >>>> >>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) >>>> >>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) >>>> >>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) >>>> >>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480) >>>> >>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468) >>>> >>>> at >>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539) >>>> >>>> at >>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) >>>> >>>> at >>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) >>>> >>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) >>>> >>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) >>>> >>>> at >>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464) >>>> >>>> at >>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436) >>>> >>>> at >>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114) >>>> >>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114) >>>> >>>> at org.apache.spark.deploy.worker.ExecutorRunner.org >>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149) >>>> >>>> at >>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) >>>> >>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing >>>> driver driver-20200505181719-1187 >>>> >>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process! >>>> >>>> >>>> >>>> >>>> Regards >>>> Hrishi >>>> >>>