The behavior is a deliberate design decision by the Spark team.

If Spark were to "fail fast", it would prevent the system from recovering
from many classes of errors that are in principle recoverable (for example
if two otherwise unrelated jobs cause a garbage collection spike on the
same node). Checkpointing and similar features have been added to support
high availability and platform resilience.

As regards the more general topic of exception handling and recovery, I
side with Bruce Eckel <https://www.artima.com/intv/handcuffs.html> and (for
Java) Josh Bloch (see Effective Java, Exception Handling). The
Scala+functional community is similarly opinionated against using
exceptions for explicit control flow. (scala.util.Try is useful for
supporting libraries that don't share this opinion.)

Higher-level design thoughts:

I recommend reading Chapter 15 of Chambers & Zaharia's *Spark The
Definitive Guide *(at least)*. *The Spark engine makes some assumptions
about execution boundaries being managed by Spark (that the Spark Jobs get
broken into Tasks on the Executor and are managed by the resource manager).
If multiple Threads are executing within a given Task, I would expect
things like data exchange/shuffle to get unpredictable.

Said a different way: Spark is a micro-batch architecture, even when using
the streaming apis. The Spark Application is assumed to be relatively
light-weight (the goal is to parallelize execution across big data, after
all).

You might also look at the way the Apache Livy
<https://livy.incubator.apache.org/> team is implementing their solution.

HTH
Jason


On Tue, May 21, 2019 at 6:04 AM bsikander <behro...@gmail.com> wrote:

> Ok, I found the reason.
>
> In my QueueStream example, I have a while(true) which keeps on adding the
> RDDs, my awaitTermination call if after the while loop. Since, the while
> loop never exits, awaitTermination never gets fired and never get reported
> the exceptions.
>
>
> The above was just the problem with the code that I tried to show my
> problem
> with.
>
> My real problem was due to the shutdown behavior of Spark. Spark streaming
> does the following
>
> - context.start() triggers the pipeline, context.awaitTerminate() block the
> current thread, whenever an exception is reported, awaitTerminated throws
> an
> exception. Since generally, we never have any code after awaitTerminate,
> the
> shutdown hooks get called which stops the spark context.
>
> - I am using spark-jobserver, when an exception is reported from
> awaitTerminate, jobserver catches the exception and updates the status of
> job in database but the driver process keeps on running because the main
> thread in driver is waiting for an Akka actor to shutdown which belongs to
> jobserver. Since, it never shutsdown, the driver keeps on running and no
> one
> executes a context.stop(). Since context.stop() is not executed, the
> jobschedular and generator keeps on running and job also keeps on going.
>
> This implicit behavior of Spark where it relies on shutdown hooks to close
> the context is a bit strange. I believe that as soon as an exception is
> reported, the spark should just execute context.stop(). This behavior can
> have serious consequence e.g. data loss. Will fix it though.
>
> What is your opinion on stopping the context as soon as an exception is
> raised?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Thanks,
Jason

Reply via email to