Re: Spark data source resiliency

2018-07-03 Thread assaf.mendelson
You are correct, this solved it. Thanks -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark data source resiliency

2018-07-03 Thread Wenchen Fan
I believe you are using something like `local[8]` as your Spark mater, which can't retry tasks. Please try `local[8, 3]` which can re-try failed tasks 3 times. On Tue, Jul 3, 2018 at 2:42 PM assaf.mendelson wrote: > That is what I expected, however, I did a very simple test (using println > just

Re: Spark data source resiliency

2018-07-02 Thread assaf.mendelson
That is what I expected, however, I did a very simple test (using println just to see when the exception is triggered in the iterator) using local master and I saw it failed once and cause the entire operation to fail. Is this something which may be unique to local master (or some default configur

Re: Spark data source resiliency

2018-07-02 Thread Wenchen Fan
a failure in the data reader results to a task failure, and Spark will re-try the task for you (IIRC re-try 3 times before fail the job). Can you check your Spark log and see if the task fails consistently? On Tue, Jul 3, 2018 at 2:17 PM assaf.mendelson wrote: > Hi All, > > I am implemented a d

Spark data source resiliency

2018-07-02 Thread assaf.mendelson
Hi All, I am implemented a data source V2 which integrates with an internal system and I need to make it resilient to errors in the internal data source. The issue is that currently, if there is an exception in the data reader, the exception seems to fail the entire task. I would prefer instead t