Thanks Jacek for quick response.
Due to our system constraints, we can't move to Structured Streaming now.
But definitely YARN can be tried out.

But my problem is I'm able to figure out where is the issue, Driver,
Executor, or Worker. Even exceptions are clueless.  Please see the below
exception, I'm unable to spot the issue for OOM.

20/05/08 15:36:55 INFO Worker: Asked to kill driver
driver-20200508153502-1291

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was killed
by user

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
/grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922




On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Sorry for being perhaps too harsh, but when you asked "Am I missing
> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
> use Spark Structured Streaming at the very least and/or use YARN as the
> cluster manager".
>
> Another thought was that the user code (your code) could be leaking
> resources so Spark eventually reports heap-related errors that may not
> necessarily be Spark's.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am getting out of memory error in worker log in streaming jobs in every
>> couple of hours. After this worker dies. There is no shuffle, no
>> aggression, no. caching  in job, its just a transformation.
>> I'm not able to identify where is the problem, driver or executor. And
>> why worker getting dead after the OOM streaming job should die. Am I
>> missing something.
>>
>> Driver Memory:  2g
>> Executor memory: 4g
>>
>> Spark Version:  2.4
>> Kafka Direct Stream
>> Spark Standalone Cluster.
>>
>>
>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>> with view permissions: Set(); users  with modify permissions: Set(root);
>> groups with modify permissions: Set()
>>
>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
>> in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>
>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>
>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>
>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>
>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>
>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>
>> at
>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
>> driver-20200505181719-1187
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>
>>
>>
>>
>> Regards
>> Hrishi
>>
>

Reply via email to