We submit spark job through spark-submit command, Like below one.

sudo /var/lib/pf-spark/bin/spark-submit \
--total-executor-cores 30 \
--driver-cores 2 \
--class com.hrishikesh.mishra.Main\
--master spark://XX.XX.XXX.19:6066  \
--deploy-mode cluster  \
--supervise http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar




We have python http server, where we hosted all jars.

The user kill the driver driver-20200508153502-1291 and its visible in log
also, but this is not problem. OOM is separate from this.

20/05/08 15:36:55 INFO Worker: Asked to kill driver
driver-20200508153502-1291

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
killed by user*

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
/grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922


On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> It's been a while since I worked with Spark Standalone, but I'd check the
> logs of the workers. How do you spark-submit the app?
>
> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com>
> wrote:
>
>> Thanks Jacek for quick response.
>> Due to our system constraints, we can't move to Structured Streaming now.
>> But definitely YARN can be tried out.
>>
>> But my problem is I'm able to figure out where is the issue, Driver,
>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>> exception, I'm unable to spot the issue for OOM.
>>
>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>> driver-20200508153502-1291
>>
>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>
>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
>> killed by user
>>
>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>
>>
>>
>>
>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>> cluster manager".
>>>
>>> Another thought was that the user code (your code) could be leaking
>>> resources so Spark eventually reports heap-related errors that may not
>>> necessarily be Spark's.
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> <https://twitter.com/jaceklaskowski>
>>>
>>>
>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am getting out of memory error in worker log in streaming jobs in
>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>> aggression, no. caching  in job, its just a transformation.
>>>> I'm not able to identify where is the problem, driver or executor. And
>>>> why worker getting dead after the OOM streaming job should die. Am I
>>>> missing something.
>>>>
>>>> Driver Memory:  2g
>>>> Executor memory: 4g
>>>>
>>>> Spark Version:  2.4
>>>> Kafka Direct Stream
>>>> Spark Standalone Cluster.
>>>>
>>>>
>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>>> groups with modify permissions: Set()
>>>>
>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>> exception in thread Thread[ExecutorRunner for
>>>> app-20200506124717-10226/0,5,main]
>>>>
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>>
>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>>
>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>> Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>> Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>> Source)
>>>>
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>>
>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>>
>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>>
>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>>
>>>> at
>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>>
>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>> driver driver-20200505181719-1187
>>>>
>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>> Hrishi
>>>>
>>>

Reply via email to