[ 
https://issues.apache.org/jira/browse/SPARK-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Koeninger closed SPARK-15408.
----------------------------------
    Resolution: Cannot Reproduce

> Spark streaming app crashes with NotLeaderForPartitionException 
> ----------------------------------------------------------------
>
>                 Key: SPARK-15408
>                 URL: https://issues.apache.org/jira/browse/SPARK-15408
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 64 bit
>            Reporter: Johny Mathew
>            Priority: Critical
>
> We have a spark streaming application reading from kafka (with Kafka Direct 
> API) and it crashed with the exception shown in the next paragraph. We have a 
> 5 node kafka cluster with 19 partitions  (replication factor 3). Even though 
> the the spark application crashed the other kafka consumer apps were running 
> fine. Only one of the 5 kafka node was not working correctly (it did not go 
> down)
> /opt/hadoop/bin/yarn application -status application_1463151451543_0007
> 16/05/13 20:09:56 INFO client.RMProxy: Connecting to ResourceManager at 
> /172.16.130.189:8050
> Application Report :
>       Application-Id : application_1463151451543_0007
>       Application-Name : com.ibm.alchemy.eventgen.EventGenMetrics
>       Application-Type : SPARK
>       User : stack
>       Queue : default
>       Start-Time : 1463155034571
>       Finish-Time : 1463155310520
>       Progress : 100%
>       State : FINISHED
>       Final-State : FAILED
>       Tracking-URL : N/A
>       RPC Port : 0
>       AM Host : 172.16.130.188
>       Aggregate Resource Allocation : 9562329 MB-seconds, 2393 vcore-seconds
>       Diagnostics : User class threw exception: 
> org.apache.spark.SparkException: 
> ArrayBuffer(kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: 
> Couldn't find leader offsets for Set([alchemy-metrics,17], 
> [alchemy-metrics,10], [alchemy-metrics,3], [alchemy-metrics,4], 
> [alchemy-metrics,9], [alchemy-metrics,15], [alchemy-metrics,18], 
> [alchemy-metrics,5]))
> We cleared checkpoint and started the application but it crashed again. Then 
> at the end we found out the misbehaving kafka node and restarted it which 
> fixed the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to