: "Tathagata Das" <t...@databricks.com>
Date: Oct 14, 2015 1:28 PM
Subject: Re: Spark 1.5 java.net.ConnectException: Connection refused
To: "Spark Newbie" <sparknewbie1...@gmail.com>
Cc: "user" <user@spark.apache.org>, "Shixiong (Ryan) Zhu
What is the best way to fail the application when job gets aborted?
On Wed, Oct 14, 2015 at 1:27 PM, Tathagata Das wrote:
> When a job gets aborted, it means that the internal tasks were retried a
> number of times before the system gave up. You can control the number
>
Is it slowing things down or blocking progress.
>> I didn't see slowing of processing, but I do see jobs aborted
consecutively for a period of 18 batches (5 minute batch intervals). So I
am worried about what happened to the records that these jobs were
processing.
Also, one more thing to mention
I ran 2 different spark 1.5 clusters that have been running for more than a
day now. I do see jobs getting aborted due to task retry's maxing out
(default 4) due to ConnectionException. It seems like the executors die and
get restarted and I was unable to find the root cause (same app code and
When a job gets aborted, it means that the internal tasks were retried a
number of times before the system gave up. You can control the number
retries (see Spark's configuration page). The job by default does not get
resubmitted.
You could try getting the logs of the failed executor, to see what
Hi Spark users,
I'm seeing the below exception in my spark streaming application. It
happens in the first stage where the kinesis receivers receive records and
perform a flatMap operation on the unioned Dstream. A coalesce step also
happens as a part of that stage for optimizing the performance.
Is this happening too often? Is it slowing things down or blocking
progress. Failures once in a while is part of the norm, and the system
should take care of itself.
On Tue, Oct 13, 2015 at 2:47 PM, Spark Newbie
wrote:
> Hi Spark users,
>
> I'm seeing the below