Hi there,

I’m trying to improve performance on a job that has GC troubles and takes 
longer to compute simply because it has to recompute failed tasks. After 
deferring object creation as much as possible, I’m now trying to improve memory 
usage with StorageLevel.MEMORY_AND_DISK_SER and a custom KryoRegistrator that 
registers all used classes. This works fine both in unit tests and on a local 
cluster (i.e. master and worker on my dev machine). On the production cluster 
this fails without any error message except:

Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most 
recent failure: Lost task 10.3 in stage 2.0 (TID 20, xxx.compute.internal): 
ExecutorLostFailure (executor lost)
Driver stacktrace:

Is there any way to understand what’s going on? The logs don’t show anything. 
I’m using Spark 1.1.1.


Thanks
- Marius


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to