Hi there, I’m trying to improve performance on a job that has GC troubles and takes longer to compute simply because it has to recompute failed tasks. After deferring object creation as much as possible, I’m now trying to improve memory usage with StorageLevel.MEMORY_AND_DISK_SER and a custom KryoRegistrator that registers all used classes. This works fine both in unit tests and on a local cluster (i.e. master and worker on my dev machine). On the production cluster this fails without any error message except:
Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most recent failure: Lost task 10.3 in stage 2.0 (TID 20, xxx.compute.internal): ExecutorLostFailure (executor lost) Driver stacktrace: Is there any way to understand what’s going on? The logs don’t show anything. I’m using Spark 1.1.1. Thanks - Marius --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org