I've built a spark job in which an external program is called through the use of pipe(). Job runs correctly on cluster when the input is a small sample dataset but when the input is a real large dataset it stays on RUNNING state forever.
I've tried different ways to tune executor memory, executor cores, overhead memory but havent found a solution so far. I've also tried to force external program to use only 1 thread in case there is a problem due to it being a multithread application but nothing. Any suggestion would be welcome -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org