We submit spark job through spark-submit command, Like below one.

sudo /var/lib/pf-spark/bin/spark-submit \
--total-executor-cores 30 \
--driver-cores 2 \
--class com.hrishikesh.mishra.Main\
--master spark://XX.XX.XXX.19:6066  \
--deploy-mode cluster  \
--supervise http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar

We have python http server, where we hosted all jars.

The user kill the driver driver-20200508153502-1291 and its visible in log
also, but this is not problem. OOM is separate from this.

20/05/08 15:36:55 INFO Worker: Asked to kill driver

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
killed by user*

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory

On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
> It's been a while since I worked with Spark Standalone, but I'd check the
> logs of the workers. How do you spark-submit the app?
> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com>
> wrote:
>> Thanks Jacek for quick response.
>> Due to our system constraints, we can't move to Structured Streaming now.
>> But definitely YARN can be tried out.
>> But my problem is I'm able to figure out where is the issue, Driver,
>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>> exception, I'm unable to spot the issue for OOM.
>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>> Hi,
>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>> cluster manager".
>>> Another thought was that the user code (your code) could be leaking
>>> resources so Spark eventually reports heap-related errors that may not
>>> necessarily be Spark's.
>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com>
>>> wrote:
>>>> Hi
>>>> I am getting out of memory error in worker log in streaming jobs in
>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>> aggression, no. caching  in job, its just a transformation.
>>>> I'm not able to identify where is the problem, driver or executor. And
>>>> why worker getting dead after the OOM streaming job should die. Am I
>>>> missing something.
>>>> Driver Memory:  2g
>>>> Executor memory: 4g
>>>> Spark Version:  2.4
>>>> Kafka Direct Stream
>>>> Spark Standalone Cluster.
>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>>> groups with modify permissions: Set()
>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>> exception in thread Thread[ExecutorRunner for
>>>> app-20200506124717-10226/0,5,main]
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>> Source)
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>> Source)
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>> Source)
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>> at
>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>> driver driver-20200505181719-1187
>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>> Regards
>>>> Hrishi

