Hi,

I have spark kafka streaming job running in Yarn cluster mode with
spark.task.maxFailures=4 (default)
spark.yarn.max.executor.failures=8
number of executor=1
spark.streaming.stopGracefullyOnShutdown=false
checkpointing enabled


- When there is RuntimeException in a batch in executor then same batch
retired 4 times and moving to next batch. Likewise it moves to many batch
and later executor is failing. Executor receives the shutdown after few
seconds. Driver and executor is killed.
- Then driver and executor relaunching with very high offset than failed
executor last offset used.

I expected executor fails after a batch fails 4 times and relaunch the new
executor with same failed batch.

Driver creating stages with new batch range after previous batch fails 4
times. How to stop create new task in executor? How to avoid such data loss?

Spark version: 1.6.1


-- 
Regards
Vasanth kumar RJ

Reply via email to